hadoop是大数据环境下必备的一套系统,使用hadoop集群可以充分的共享服务器资源,在离线处理上已经有了多年的应用。
Spring Hadoop简化了Apache Hadoop,提供了一个统一的配置模型以及简单易用的API来使用HDFS、MapReduce、Pig以及Hive。还集成了其它Spring生态系统项目,如Spring Integration和Spring Batch.。
Spring Hadoop2.5的官方文档及API地址:
spring-hadoop文档
spring-hadoop API
Spring Hadoop
添加仓库,配置依赖
1<repositories>
2 <repository>
3 <id>clouderaid>
4 <url>https://repository.cloudera.com/artifactory/cloudera-repos/url>
5 <releases>
6 <enabled>trueenabled>
7 releases>
8 <snapshots>
9 <enabled>falseenabled>
10 snapshots>
11 repository>
12 repositories>
13
14 <properties>
15 <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
16 <hadoop.version>2.6.0-cdh5.7.0hadoop.version>
17 properties>
18
19 <dependencies>
20 <dependency>
21 <groupId>org.apache.hadoopgroupId>
22 <artifactId>hadoop-clientartifactId>
23 <version>${hadoop.version}version>
24 <scope>providedscope>
25 dependency>
26
27 <dependency>
28 <groupId>com.kumkeegroupId>
29 <artifactId>UserAgentParserartifactId>
30 <version>0.0.1version>
31 dependency>
32 <dependency>
33 <groupId>junitgroupId>
34 <artifactId>junitartifactId>
35 <version>4.10version>
36 <scope>testscope>
37 dependency>
38
39 <dependency>
40 <groupId>org.springframework.datagroupId>
41 <artifactId>spring-data-hadoopartifactId>
42 <version>2.5.0.RELEASEversion>
43 dependency>
44 dependencies>
45 ```
46 2. 在Spring的配置文件中添加hadoop配置
47 ```
48
49<beans xmlns="http://www.springframework.org/schema/beans"
50 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
51 xmlns:hdp="http://www.springframework.org/schema/hadoop"
52 xmlns:context="http://www.springframework.org/schema/context"
53 xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
54 http://www.springframework.org/schema/hadoop
55 http://www.springframework.org/schema/hadoop/spring-hadoop.xsd http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd">
56 ......
57
58
59 <context:property-placeholder location="application.properties"/>
60 <hdp:configuration id="hadoopConfiguration">
61
62 fs.defaultFS=${spring.hadoop.fsUri}
63 hdp:configuration>
64
65 <hdp:file-system id="fileSystem" configuration-ref="hadoopConfiguration" user="root"/>
66beans>
只需配置hadoop服务器,服务器的url和加载属性文件。
然后再创建一个属性文件application.properties,添加hadoop配置信息。
test
1public class SpringHadoopApp {
2
3 private ApplicationContext ctx;
4 private FileSystem fileSystem;
5
6 @Before
7 public void setUp() {
8 ctx = new ClassPathXmlApplicationContext("applicationContext.xml");
9 fileSystem = (FileSystem) ctx.getBean("fileSystem");
10 }
11
12 @After
13 public void tearDown() throws IOException {
14 ctx = null;
15 fileSystem.close();
16 }
17
18 /**
19 * 在HDFS上创建一个目录
20 * @throws Exception
21 */
22 @Test
23 public void testMkdirs()throws Exception{
24 fileSystem.mkdirs(new Path("/SpringHDFS/"));
25 }
26}
或者可以采用直接加载hadoop的配置文件的方式进行配置
将
Spring Data Hbase
添加依赖
1<dependency>
2 <groupId>org.apache.hadoopgroupId>
3 <artifactId>hadoop-authartifactId>
4 dependency>
5 <dependency>
6 <groupId>org.apache.hbasegroupId>
7 <artifactId>hbase-clientartifactId>
8 <version>1.2.3version>
9 <scope>compilescope>
10 <exclusions>
11 <exclusion>
12 <groupId>log4jgroupId>
13 <artifactId>log4jartifactId>
14 exclusion>
15 <exclusion>
16 <groupId>org.slf4jgroupId>
17 <artifactId>slf4j-log4j12artifactId>
18 exclusion>
19 exclusions>
20 dependency>
拷贝Hbase配置文件,整合applictionContext.xml
将HBase的配置文件hbase-site.xml复制到resources下,新建Spring配置文件applicationContext.xml
1
2<beans xmlns="http://www.springframework.org/schema/beans"
3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
4 xmlns:context="http://www.springframework.org/schema/context"
5 xmlns:hdp="http://www.springframework.org/schema/hadoop"
6 xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsd
7 http://www.springframework.org/schema/context http://www.springframework.org/schema/context/spring-context.xsd
8 http://www.springframework.org/schema/hadoop http://www.springframework.org/schema/hadoop/spring-hadoop.xsd">
9
10 <context:annotation-config/>
11 <context:component-scan base-package="com.sample.hbase"/>
12 <hdp:configuration resources="hbase-site.xml"/>
13 <hdp:hbase-configuration configuration-ref="hadoopConfiguration"/>
14 <bean id="hbaseTemplate" class="org.springframework.data.hadoop.hbase.HbaseTemplate">
15 <property name="configuration" ref="hbaseConfiguration"/>
16 bean>
17beans>
配置HbaseTemplate,和hbase配置文件位置
test
1@RunWith(SpringJUnit4ClassRunner.class)
2@ContextConfiguration(locations = {"classpath*:applicationContext.xml"})
3public class BaseTest {
4
5 @Autowired
6 private HbaseTemplate template;
7
8 @Test
9 public void testFind() {
10 List rows = template.find("user", "cf", "name", new RowMapper() {
11 public String mapRow(Result result, int i) throws Exception {
12 return result.toString();
13 }
14 });
15 Assert.assertNotNull(rows);
16 }
17
18 @Test
19 public void testPut() {
20 template.put("user", "xiaogao", "cf", "name", Bytes.toBytes("Alice"));
21 }
22}
