加载数据到hive中的方式
html
官网:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DMLjava
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
* [LOCAL]
apache
表示数据文件在本地仍是hdfs上
若是是加载hdfs上文件数据到表中去,实质是将hdfs上文件move到对应的hive表中的目录下。
* [OVERWRITE]
表示是否覆盖表中已有的数据
一、准备工做:
app
use db_hive_0927 ;
oop
drop table if exists emp ;
create table emp( empno int, ename string, job string, mgr int, hiredate string, sal double, comm double, deptno int ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
spa
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ; Copying data from file:/opt/hive-0.13.1/emp.txt Copying file: file:/opt/hive-0.13.1/emp.txt Loading data to table db_hive_0927.emp Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0] OK Time taken: 4.471 seconds
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
code
hive> load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ; Copying data from file:/opt/hive-0.13.1/emp.txt Copying file: file:/opt/hive-0.13.1/emp.txt Loading data to table db_hive_0927.emp Table db_hive_0927.emp stats: [numFiles=1, numRows=0, totalSize=656, rawDataSize=0] OK Time taken: 4.471 seconds
三、加载hdfs到hive中
orm
load data local inpath '/opt/hive-0.13.1/emp.txt' into table emp ;
htm
hive> load data inpath '/emp.txt' into table emp ; Loading data to table db_hive_0927.emp Table db_hive_0927.emp stats: [numFiles=2, numRows=0, totalSize=1312, rawDataSize=0] OK Time taken: 11.055 seconds
四、覆盖已有表中的数据
hadoop
load data inpath '/emp.txt' overwrite into table emp ;
hive> load data inpath '/emp.txt' overwrite into table emp ; FAILED: SemanticException Line 1:17 Invalid path ''/emp.txt'': No files matching path hdfs://cluster/emp.txt
报错!!!!!!!!!!由于加载hdfs上文件数据到表中,实质是将hdfs上文件move到对应的hive表中的目录下没有找到文件
五、建立表的时候经过select加载数据
create table db_hive_0927.emp_load as select empno, ename, deptno from db_hive_0927.emp ;
hive> create table db_hive_0927.emp_load as select empno, ename, deptno from db_hive_0927.emp ; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1445142802171_0006, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0006/ Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0006 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0 2015-10-19 08:49:57,599 Stage-1 map = 0%, reduce = 0% 2015-10-19 08:50:58,584 Stage-1 map = 0%, reduce = 0% 2015-10-19 08:51:28,175 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 5.75 sec 2015-10-19 08:52:28,757 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 6.66 sec 2015-10-19 08:52:59,494 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 12.5 sec MapReduce Total cumulative CPU time: 12 seconds 500 msec Ended Job = job_1445142802171_0006 Stage-4 is filtered out by condition resolver. Stage-3 is selected by condition resolver. Stage-5 is filtered out by condition resolver. Launching Job 3 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1445142802171_0007, Tracking URL = http://cloud1:8088/proxy/application_1445142802171_0007/ Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0007 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2015-10-19 08:54:11,395 Stage-3 map = 0%, reduce = 0% 2015-10-19 08:54:26,512 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.43 sec MapReduce Total cumulative CPU time: 2 seconds 430 msec Ended Job = job_1445142802171_0007 Moving data to: hdfs://cluster/user/hive/warehouse/db_hive_0927.db/emp_load Table db_hive_0927.emp_load stats: [numFiles=1, numRows=0, totalSize=392, rawDataSize=0] MapReduce Jobs Launched: Job 0: Map: 2 Cumulative CPU: 13.36 sec HDFS Read: 1751 HDFS Write: 574 SUCCESS Job 1: Map: 1 Cumulative CPU: 2.43 sec HDFS Read: 761 HDFS Write: 392 SUCCESS Total MapReduce CPU Time Spent: 15 seconds 790 msec OK Time taken: 378.773 seconds
六、建立表时,经过insert方式加载数据
hive> create table db_hive_0927.emp_ins like db_hive_0927.emp ; OK Time taken: 0.505 seconds
hive> desc db_hive_0927.emp_ins; OK empno int ename string job string mgr int hiredate string sal double comm double deptno int Time taken: 0.587 seconds, Fetched: 8 row(s)
use db_hive_0927 ;
追加数据
insert into table emp_ins select * from emp ;
hive> insert into table emp_ins select * from emp ; Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1445142802171_0008, Tracking URL = N/A Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0008 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0 2015-10-19 09:04:33,042 Stage-1 map = 0%, reduce = 0% 2015-10-19 09:05:18,854 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 9.72 sec MapReduce Total cumulative CPU time: 9 seconds 720 msec Ended Job = job_1445142802171_0008 Stage-4 is filtered out by condition resolver. Stage-3 is selected by condition resolver. Stage-5 is filtered out by condition resolver. Launching Job 3 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1445142802171_0009, Tracking URL = N/A Kill Command = /opt/hadoop/bin/hadoop job -kill job_1445142802171_0009 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2015-10-19 09:08:57,711 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:09:58,092 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:11:08,156 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:12:09,288 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:13:12,190 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:14:49,177 Stage-3 map = 0%, reduce = 0% 2015-10-19 09:15:53,783 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.28 sec MapReduce Total cumulative CPU time: 4 seconds 280 msec Ended Job = job_1445142802171_0009 Loading data to table db_hive_0927.emp_ins Table db_hive_0927.emp_ins stats: [numFiles=1, numRows=28, totalSize=1322, rawDataSize=1294] MapReduce Jobs Launched: Job 0: Map: 2 Cumulative CPU: 10.46 sec HDFS Read: 1751 HDFS Write: 1476 SUCCESS Job 1: Map: 1 Cumulative CPU: 7.96 sec HDFS Read: 1691 HDFS Write: 1322 SUCCESS Total MapReduce CPU Time Spent: 18 seconds 420 msec OK Time taken: 1180.09 seconds
覆盖表中数据
insert overwrite table emp_ins select * from emp ;
from emp insert overwrite table emp_ins select * ;
七、 建立表时,经过location指定数据存储位置,加载数据
create external table emp_ext_loc like emp
location '/test/emp.txt' ;
这里好像有问题!!!求助 运行报错!!!