复制代码

Sqoop笔记
1.下载
https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/
本次下载:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz
解压到/home/hadoop/sqoop
拷贝sqoop-1.4.7.jar到/home/hadoop/sqoop/lib目录下

2.配置conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop
export HADOOP_MAPRED_HOME=/home/hadoop/hadoop
export HIVE_HOME=/home/hadoop/hive

3.配置环境变量~/.bashrc
验证sqoop
sqoop version

4.执行命名报错,主要是确实一些tar
sqoop/lib
sqoop-1.4.7.jar
mysql-connector-java-5.1.39-bin.jar
commons-lang3-3.3.2.jar
avro-tools-1.8.1.jar 
下载地址:https://maven.aliyun.com/mvn/search

5.将mysql数据库中的表数据导入到HDFS指定目录
bin/sqoop import \
--connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
--username root \
--password root \
--target-dir /sqoop/post \
--table post --m 1

6.导入表数据子集
sqoop import \
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
    --username root \
    --password root \
    --where "user_id>1" \
    --target-dir /sqoop/post2 \
    --table post \
    --m 1

7.增量导入
sqoop import  \
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
    --username root \
    --password root \
    --target-dir /sqoop/post \
    --table post \
    --m 1 \
    --incremental append \
    --check-column id \
    --last-value 9

hadoop fs -cat /sqoop/post/part-m-*

8.hdfs导出数据到mysql
sqoop export \
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test?characterEncoding=UTF-8 \
    --username root \
    --password root \
    --table student \
    --m 1 \
    --export-dir /user/hive/warehouse/jstudent

characterEncoding=UTF-8解决导入到mysql的数据中文乱码问题。


9.作业(job)环境配置
配置mysql为metastore
修改配置文件sqoop-site.xml,以下配置默认是被注释掉的,现在将它都打开:
  <property>
    <name>sqoop.metastore.client.autoconnect.url</name>
    <value>jdbc:mysql://192.168.15.45:3307/sqoop?createDatabaseIfNotExist=true</value>
    <description>The connect string to use when connecting to a
      job-management metastore. If unspecified, uses ~/.sqoop/.
      You can specify a different path here.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.username</name>
    <value>root</value>
    <description>The username to bind to the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.client.autoconnect.password</name>
    <value>root</value>
    <description>The password to bind to the metastore.
    </description>
  </property>
<property>
    <name>sqoop.metastore.client.record.password</name>
    <value>true</value>
    <description>If true, allow saved passwords in the metastore.
    </description>
  </property>
  <property>
    <name>sqoop.jobbase.serialize.sqoopoptions</name>
    <value>true</value>
    <description>If true, then all options will be serialized into job.xml
    </description>
  </property>
  <property>
    <name>sqoop.metastore.server.location</name>
    <value>/tmp/sqoop-metastore/shared.db</value>
    <description>Path to the shared metastore database files.
    If this is not set, it will be placed in ~/.sqoop/.
    </description>
  </property>
  <property>
    <name>sqoop.metastore.server.port</name>
    <value>16000</value>
    <description>Port that this metastore should listen on.
    </description>
  </property>

10.启动metastore
sqoop metastore &

11.运行sqoop job --list 报错
java.io.IOException: Invalid metadata version
解决办法:
进入mysql的sqoop数据库中,自动生成了SQOOP_ROOT表
在表中插入:
INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0');
再次运行就不会报错

12.创建job过程又报错,缺少jar包
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-beanutils-1.7.0.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-collections-3.2.2.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-lang-2.6.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-logging-1.2.jar /home/hadoop/sqoop/lib/
cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/json-20070829.jar /home/hadoop/sqoop/lib/


13.创建import的job作业
sqoop job --create myjob1 \
    -- import \
    --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \
    --username root \
    --password root \
    --target-dir /sqoop/zhilian \
    --table zhiliancookies \
    --m 1

14.执行作业
sqoop job --exec myjob1

15.代码定制
sqoop codegen --connect jdbc:mysql://192.168.15.45:3307/a_spider_test --username root --password root --table post

 转自:https://www.cnblogs.com/zhangkaipc/p/11890858.html