加载数据到hive中的方式

加载数据到hive中的方式
html

官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DMLjava

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]

* [LOCAL]
表示数据文件在本地仍是hdfs上
若是是加载hdfs上文件数据到表中去,实质是将hdfs上文件move到对应的hive表中的目录下。
* [OVERWRITE]
表示是否覆盖表中已有的数据
apache

一、准备工做:app

use db_hive_0927 ;
drop table if exists emp ;
oop

create table emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;

二、加载本地文件到hive中

load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
spa

hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 4.471 seconds

load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
code

hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
Copying data from file:/opt/hive-0.13.1/emp.txt
Copying file: file:/opt/hive-0.13.1/emp.txt
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0]
OK
Time taken: 4.471 seconds


三、加载hdfs到hive中
orm

load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
htm

hive> load data inpath '/emp.txt' into table emp ;
Loading data to table db_hive_0927.emp
Table db_hive_0927.emp stats: [numFiles=2, numRows=0, totalSize=1312, rawDataSize=0]
OK
Time taken: 11.055 seconds


四、覆盖已有表中的数据hadoop

load data inpath '/emp.txt' overwrite into table emp ;


hive> load data inpath '/emp.txt' overwrite into table emp ;
FAILED: SemanticException Line 1:17 Invalid path ''/emp.txt'': No files matching path hdfs://cluster/emp.txt


报错!!!!!!!!!!由于加载hdfs上文件数据到表中,实质是将hdfs上文件move到对应的hive表中的目录下没有找到文件


五、建立表的时候经过select加载数据

create table db_hive_0927.emp_load as  select empno, ename, deptno from db_hive_0927.emp ;

hive> create table db_hive_0927.emp_load as  select empno, ename, deptno from db_hive_0927.emp ;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0006, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0006/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0006
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-10-19 08:49:57,599 Stage-1 map = 0%,  reduce = 0%
2015-10-19 08:50:58,584 Stage-1 map = 0%,  reduce = 0%
2015-10-19 08:51:28,175 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 5.75 sec
2015-10-19 08:52:28,757 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 6.66 sec
2015-10-19 08:52:59,494 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 12.5 sec
MapReduce Total cumulative CPU time: 12 seconds 500 msec
Ended Job = job_1445142802171_0006
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0007, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0007/
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0007
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-10-19 08:54:11,395 Stage-3 map = 0%,  reduce = 0%
2015-10-19 08:54:26,512 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.43 sec
MapReduce Total cumulative CPU time: 2 seconds 430 msec
Ended Job = job_1445142802171_0007
Moving data to: hdfs://cluster/user/hive/warehouse/db_hive_0927.db/emp_load
Table db_hive_0927.emp_load stats: [numFiles=1, numRows=0, totalSize=392, rawDataSize=0]
MapReduce Jobs Launched: 
Job 0: Map: 2   Cumulative CPU: 13.36 sec   HDFS Read: 1751 HDFS Write: 574 SUCCESS
Job 1: Map: 1   Cumulative CPU: 2.43 sec   HDFS Read: 761 HDFS Write: 392 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 790 msec
OK
Time taken: 378.773 seconds


六、建立表时,经过insert方式加载数据

hive> create table db_hive_0927.emp_ins like db_hive_0927.emp ;
OK
Time taken: 0.505 seconds


hive> desc db_hive_0927.emp_ins;
OK
empno               	int                 	                    
ename               	string              	                    
job                 	string              	                    
mgr                 	int                 	                    
hiredate            	string              	                    
sal                 	double              	                    
comm                	double              	                    
deptno              	int                 	                    
Time taken: 0.587 seconds, Fetched: 8 row(s)


use db_hive_0927 ;

        追加数据
insert into table emp_ins select * from emp ; 

hive> insert into table emp_ins select * from emp ;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0008, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0008
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2015-10-19 09:04:33,042 Stage-1 map = 0%,  reduce = 0%
2015-10-19 09:05:18,854 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 9.72 sec
MapReduce Total cumulative CPU time: 9 seconds 720 msec
Ended Job = job_1445142802171_0008
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1445142802171_0009, Tracking URL = N/A
Kill Command = /opt/hadoop/bin/hadoop job  -kill job_1445142802171_0009
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2015-10-19 09:08:57,711 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:09:58,092 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:11:08,156 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:12:09,288 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:13:12,190 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:14:49,177 Stage-3 map = 0%,  reduce = 0%
2015-10-19 09:15:53,783 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 4.28 sec
MapReduce Total cumulative CPU time: 4 seconds 280 msec
Ended Job = job_1445142802171_0009
Loading data to table db_hive_0927.emp_ins
Table db_hive_0927.emp_ins stats: [numFiles=1, numRows=28, totalSize=1322, rawDataSize=1294]
MapReduce Jobs Launched: 
Job 0: Map: 2   Cumulative CPU: 10.46 sec   HDFS Read: 1751 HDFS Write: 1476 SUCCESS
Job 1: Map: 1   Cumulative CPU: 7.96 sec   HDFS Read: 1691 HDFS Write: 1322 SUCCESS
Total MapReduce CPU Time Spent: 18 seconds 420 msec
OK
Time taken: 1180.09 seconds


         覆盖表中数据
insert overwrite table emp_ins select * from emp ;  
from emp insert overwrite table emp_ins select * ;

七、 建立表时,经过location指定数据存储位置,加载数据
create external table emp_ext_loc like emp 
location '/test/emp.txt' ;

这里好像有问题!!!求助  运行报错!!!