1.下载 https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/ 本次下载:https://mirrors.tuna.tsinghua.edu.cn/apache/sqoop/1.4.7/sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 解压到/home/hadoop/sqoop 拷贝sqoop-1.4.7.jar到/home/hadoop/sqoop/lib目录下 2.配置conf/sqoop-env.sh export HADOOP_COMMON_HOME=/home/hadoop/hadoop export HADOOP_MAPRED_HOME=/home/hadoop/hadoop export HIVE_HOME=/home/hadoop/hive 3.配置环境变量~/.bashrc 验证sqoop sqoop version 4.执行命名报错,主要是确实一些tar sqoop/lib sqoop-1.4.7.jar mysql-connector-java-5.1.39-bin.jar commons-lang3-3.3.2.jar avro-tools-1.8.1.jar 下载地址:https://maven.aliyun.com/mvn/search 5.将mysql数据库中的表数据导入到HDFS指定目录 bin/sqoop import \ --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \ --username root \ --password root \ --target-dir /sqoop/post \ --table post --m 1 6.导入表数据子集 sqoop import \ --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \ --username root \ --password root \ --where "user_id>1" \ --target-dir /sqoop/post2 \ --table post \ --m 1 7.增量导入 sqoop import \ --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \ --username root \ --password root \ --target-dir /sqoop/post \ --table post \ --m 1 \ --incremental append \ --check-column id \ --last-value 9 hadoop fs -cat /sqoop/post/part-m-* 8.hdfs导出数据到mysql sqoop export \ --connect jdbc:mysql://192.168.15.45:3307/a_spider_test?characterEncoding=UTF-8 \ --username root \ --password root \ --table student \ --m 1 \ --export-dir /user/hive/warehouse/jstudent characterEncoding=UTF-8解决导入到mysql的数据中文乱码问题。 9.作业(job)环境配置 配置mysql为metastore 修改配置文件sqoop-site.xml,以下配置默认是被注释掉的,现在将它都打开: <property> <name>sqoop.metastore.client.autoconnect.url</name> <value>jdbc:mysql://192.168.15.45:3307/sqoop?createDatabaseIfNotExist=true</value> <description>The connect string to use when connecting to a job-management metastore. If unspecified, uses ~/.sqoop/. You can specify a different path here. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.username</name> <value>root</value> <description>The username to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.autoconnect.password</name> <value>root</value> <description>The password to bind to the metastore. </description> </property> <property> <name>sqoop.metastore.client.record.password</name> <value>true</value> <description>If true, allow saved passwords in the metastore. </description> </property> <property> <name>sqoop.jobbase.serialize.sqoopoptions</name> <value>true</value> <description>If true, then all options will be serialized into job.xml </description> </property> <property> <name>sqoop.metastore.server.location</name> <value>/tmp/sqoop-metastore/shared.db</value> <description>Path to the shared metastore database files. If this is not set, it will be placed in ~/.sqoop/. </description> </property> <property> <name>sqoop.metastore.server.port</name> <value>16000</value> <description>Port that this metastore should listen on. </description> </property> 10.启动metastore sqoop metastore & 11.运行sqoop job --list 报错 java.io.IOException: Invalid metadata version 解决办法: 进入mysql的sqoop数据库中,自动生成了SQOOP_ROOT表 在表中插入: INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0'); 再次运行就不会报错 12.创建job过程又报错,缺少jar包 cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-beanutils-1.7.0.jar /home/hadoop/sqoop/lib/ cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-collections-3.2.2.jar /home/hadoop/sqoop/lib/ cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-lang-2.6.jar /home/hadoop/sqoop/lib/ cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/commons-logging-1.2.jar /home/hadoop/sqoop/lib/ cp ./azkaban/azkaban-solo-server/build/install/azkaban-solo-server/lib/json-20070829.jar /home/hadoop/sqoop/lib/ 13.创建import的job作业 sqoop job --create myjob1 \ -- import \ --connect jdbc:mysql://192.168.15.45:3307/a_spider_test \ --username root \ --password root \ --target-dir /sqoop/zhilian \ --table zhiliancookies \ --m 1 14.执行作业 sqoop job --exec myjob1 15.代码定制 sqoop codegen --connect jdbc:mysql://192.168.15.45:3307/a_spider_test --username root --password root --table post