HBase 写流程
- Client会县访问zookeeper,得到对应的RegionServer地址
- Clinet对RegionServer发起请求,RegionServer接收数据写入内
- 当MemStore的大小达到一定的值后,flush到StoreFile并存储到HDFS
HBase中的WAL(预写日志)的实现:https://www.cnblogs.com/ohuang/p/5807543.html
HBase的读流程
- Client会先访问zookeeper,得到对应的RegionServer地址
- Client对RegionServer发起读请求
- 当RegionServer收到client的读请求后,先扫描哦自己的Memstore,再扫描BlockCache(加速读内容缓存区)如果换没找到则StoreFile读取数据,然后将数据返回给Client
HBase 模块协作
HBase启动时发生了什么?
- HMaster启动,注册到Zookeeper,等待RegionServer汇报
- RegionServer注册到Zookeeper,并向HMaster汇报
- 对各个RegionServer(包括失效的)的数据进行整理,分配Region和meta信息
当RegionServer失效后会发生什么?
- HMaster将失效RegionServer上的Region分配到其他节点
- HMaster更新hbase:meta表以保证数据正常访问
当HMaster失效后会发生什么?
- 处于Backup状态的其他HMaster节点推选出一个转为Active状态(配置高可用)
- 数据能正常读写,但是不能创建删除表,也不能更改表结构(未配置高可用)
HBase操作
HBase Shell 命令
1 查看集群状态: status
hbase(main):001:0> status
1 active master, 0 backup masters, 1 servers, 1 dead, 6.0000 average load
Took 0.7064 seconds
2 查看所有表:list
hbase(main):003:0> list
TABLE
coures_clickcount
course_search_clickcount
member
table1
4 row(s)
Took 0.0236 seconds
=> ["coures_clickcount", "course_search_clickcount", "member", "table1"]
3 创建一张表:create ‘tableName’ ‘columnFamily’
创建一张名为FileTable的表,包含两个列族,分别为fileInfo和saveInfo
hbase(main):004:0> create 'FileTable','fileInfo','saveInfo'
Created table FileTable
Took 0.8806 seconds
=> Hbase::Table - FileTable
4 查看表的描述信息:desc ‘tableName’
hbase(main):005:0> desc 'FileTable'
Table FileTable is ENABLED
FileTable
COLUMN FAMILIES DESCRIPTION
{NAME => 'fileInfo', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_
DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN
_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY =
> 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536'}
{NAME => 'saveInfo', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_
DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN
_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY =
> 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKC
ACHE => 'true', BLOCKSIZE => '65536'}
2 row(s)
Took 0.3854 seconds
5 修改表结构:alter
添加一个列族 alter ‘tableName’,’addColumnFamily’
alter 'FileTable','test'
+Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 1.8539 seconds
删除一个列族 aleter ‘tableName’,{NAME=>’columnFamily’,METHOD=>’delete’}
alter 'FileTable',{NAME=>'test',METHOD=>'delete'}
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 1.8177 seconds
6 插入数据:put ‘tableName’,’rowkey’,’columnFamily:field’,’value’
hbase(main):008:0> put 'FileTable','rowkey1','fileInfo:name','file1.txt'
Took 0.1394 seconds
hbase(main):009:0> put 'FileTable','rowkey1','fileInfo:type','txt'
Took 0.0041 seconds
hbase(main):010:0> put 'FileTable','rowkey1','fileInfo:size','1024'
Took 0.0048 seconds
hbase(main):011:0> put 'FileTable','rowkey1','saveInfo:path','/home'
Took 0.0052 seconds
hbase(main):012:0> put 'FileTable','rowkey1','saveInfo:creator','srk'
Took 0.0061 seconds
7 查看表有多少行数据:count ‘tableName’
hbase(main):013:0> count 'FileTable'
1 row(s)
Took 0.0676 seconds
=> 1
8 查询数据 get
使用get按照rowkey查询: get ‘tableName’,’rowKey’
hbase(main):005:0> get 'FileTable','rowkey1'
COLUMN CELL
fileInfo:name timestamp=1542166201745, value=file1.txt
fileInfo:size timestamp=1542166236481, value=1024
fileInfo:type timestamp=1542166217417, value=txt
saveInfo:creator timestamp=1542166283601, value=srk
saveInfo:path timestamp=1542166258675, value=/home
1 row(s)
Took 0.0281 seconds
使用get按照rowkey查询某个列族: get ‘tableName’,’rowKey’,’cloumnFamily’
hbase(main):006:0> get 'FileTable','rowkey1','fileInfo'
COLUMN CELL
fileInfo:name timestamp=1542166201745, value=file1.txt
fileInfo:size timestamp=1542166236481, value=1024
fileInfo:type timestamp=1542166217417, value=txt
1 row(s)
Took 0.0260 seconds
使用get按照rowkey查询某个列子的某个字段: get ‘tableName’,’rowKey’,’cloumnFamily:field’
hbase(main):008:0> get 'FileTable','rowkey1','fileInfo:name'
COLUMN CELL
fileInfo:name timestamp=1542166201745, value=file1.txt
1 row(s)
Took 0.0169 seconds
9 查看整张表的数据 scan
查看整张表的所有数据数据: scan ‘tableName’
hbase(main):009:0> scan 'FileTable'
ROW COLUMN+CELL
rowkey1 column=fileInfo:name, timestamp=1542166201745, value=file1
.txt
rowkey1 column=fileInfo:size, timestamp=1542166236481, value=1024
rowkey1 column=fileInfo:type, timestamp=1542166217417, value=txt
rowkey1 column=saveInfo:creator, timestamp=1542166283601, value=sr
k
rowkey1 column=saveInfo:path, timestamp=1542166258675, value=/home
1 row(s)
Took 0.0905 seconds
查看某个列族的所有数据:scan ‘tableName’,{COLUMN=>’columnFamily’}
hbase(main):010:0> scan 'FileTable',{COLUMN=>'fileInfo'}
ROW COLUMN+CELL
rowkey1 column=fileInfo:name, timestamp=1542166201745, value=file1
.txt
rowkey1 column=fileInfo:size, timestamp=1542166236481, value=1024
rowkey1 column=fileInfo:type, timestamp=1542166217417, value=txt
1 row(s)
Took 0.0191 seconds
查看某个列族的某个字段的所有数据:scan ‘tableName’,{COLUMN=>’columnFamily:field’}
hbase(main):011:0> scan 'FileTable',{COLUMN=>'fileInfo:name'}
ROW COLUMN+CELL
rowkey1 column=fileInfo:name, timestamp=1542166201745, value=file1
.txt
1 row(s)
Took 0.0051 seconds
scan 条件:STARTROW、LIMIT、VERSIONS、ENDROW
hbase(main):015:0> scan 'FileTable',{STARTROW=>'rowkey1',ENDROW=>'rowkey2',LIMIT=>2,VERSIONS=>1}
ROW COLUMN+CELL
rowkey1 column=fileInfo:name, timestamp=1542166201745, value=file1
.txt
rowkey1 column=fileInfo:size, timestamp=1542166236481, value=1024
rowkey1 column=fileInfo:type, timestamp=1542166217417, value=txt
rowkey1 column=saveInfo:creator, timestamp=1542166283601, value=sr
k
rowkey1 column=saveInfo:path, timestamp=1542166258675, value=/home
1 row(s)
Took 0.0093 seconds
10 删除数据
删除某一列的数据:delete ‘tableName’,’rowKey’,’columnFamily:field’
hbase(main):016:0> delete 'FileTable','rowkey1','fileInfo:size'
Took 0.0628 seconds
hbase(main):017:0> get 'FileTable','rowkey1','fileInfo:size'
COLUMN CELL
0 row(s)
Took 0.0113 seconds
删除某一行的数据:deleteall ‘tableName’,’rowKey’
hbase(main):023:0> deleteall 'FileTable','rowkey1'
Took 0.0118 seconds
hbase(main):024:0> get 'FileTable','rowkey1'
COLUMN CELL
0 row(s)
Took 0.0156 seconds
11 删除表
先禁用表在删除表
disable ‘tableName’
drop ‘tableName’
hbase(main):025:0> disable 'FileTable'
Took 1.3645 seconds
hbase(main):026:0> is_enabled 'FileTable'
false
Took 0.0067 seconds
=> false
hbase(main):027:0> is_disabled 'FileTable'
true
Took 0.0077 seconds
=> 1
hbase(main):028:0> drop 'FileTable'
Took 0.4925 seconds
Java 操作HBase
HBase进阶
HBase 优化策略
什么导致HBase性能下降:
- Jvm内存分配与GC回收策略
- 与HBae运行机制相关的部分配置不合理
- 表结构设计及用户使用方式不合理
HBase服务端优化
Jvm设置与GC设置
hbase-site.xml部分属性配置
HBase properties | 简介 | 默认值 |
---|---|---|
hbase.regionserver.handler.count | rpc 请求的线程数量 | 10 |
hbase.hregion.max.filesize | 当region的大小大于设定值后hbase就会开始split | 10G |
hbase.hregion.majorcompaction | major compaction 的执行周期 | 1000建议设为0 |
hbase.hstore.compaction.min | 一个store里的storefile总数超过该值,会触发默认的合并操作 | 3 |
hbase.hstore.compaction.max | 一次最多合并多少个storefile | |
hbase.hstore.blockingStorefiles | 一个region钟的Strore(CoulmnFamily)内有超过xx个storefile时,则block所有的写请求进行compaction | |
hfile.block.cache.size | regionserver的block cache 的内存大小限制 | |
hbase.hregion.memstore.flush.size | memstore超过该值将被flush | |
hbase.hregion.memstore.block.multiplier | 如果memstore的内存大小超过flush.size*multiplier,会阻塞该memstore的写操作 |
HBase 常用优化
- 预先分区
- RowKey优化
- Column优化
- Schema优化
预先分区
创建HBase表的时候回自动创建一个Region分区
创建HBase表的时候预先创建一些空的Regions
Rowkey优化
- 利用HBase默认排序特点,将一起访问的数据放到一起
- 防止热点问题,避免使用时序或者单调的递增递减等
Column优化
- 列族的名称和列的描述命名尽量简短
- 同一张表中ColumnFamily的数量不要超过3个
Schema优化
- 宽表:一种“列多行少”的设计
- 高表:一种“列少行多”的设计
HBase 写优化策略
- 同步批量提交or异步批量提交
- WAL优化,是否必须,持久化等级
HBase读优化策略
- 客户端:Scan缓存设置,批量获取
- 服务端:BlockCache配置是否合理,HFile是否过多
- 表结构的设计问题
HBase Coprocessor
HBase协处理受BigTable协处理器的启发,为用户提供类库和运行时环境,使得代码能够在HBase RegionServer和Master上处理
系统协处理器and表协处理器
Observer and Endpoint
系统协处理器:全局加载到RegionServer托管的所有表和Region上
表协处理器:用户可以指定一张表使用协处理器
观察者(Observer):类似于关系数据库的触发器
终端(Endpoint):动态的终端有点像存储过程
Observer
RegionObserver:提供客户端的数据操纵时间钩子:Get、Put、Delete、Scan等
MasterObserver:提供DDL类型的操作钩子,如创建、删除、修改、数据表等
WALObserver:提供WAL相关操作钩子