https://blog.csdn.net/Richard_vi/article/details/78909939?utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7EBlogCommendFromBaidu%7Edefault-5.control

 

 

 

 

 

环境:IDEA+jdk8+maven 3.5.2

新建maven项目,添加nlp的maven依赖:

 

<dependency>

<groupId>org.apache.opennlp</groupId>

<artifactId>opennlp-tools</artifactId>

<version>1.8.4</version>

</dependency>

 

 

然后就可以使用nlp的开发工具了。我们来看一些实例:

 

1

2

3

4

5

6

7

8

9

10

11

12

//divide sentences

public static void SentenceDetect() throws IOException {

    String paragraph = “Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. “;

    InputStream is = new FileInputStream(“E:\\NLP_Practics\\models\\en-sent.bin”);

    SentenceModel model = new SentenceModel(is);

    SentenceDetectorME sdetector = new SentenceDetectorME(model);

    String sentences[] = sdetector.sentDetect(paragraph);

    for (String single : sentences) {

        System.out.println(single);

    }

    is.close();

}

  

 

这是一个英文分词的实例,我们首先要去下载英文分词的模型,在这里,我将它放到了E:\NLP_Practics\models\目录下。

关于更多模型的下载可以在地址:

http://maven.tamingtext.com/opennlp-models/models-1.5/

中找到。

我们来看下对应的输出结果:

 

1

2

3

Hi. How are you?

This is JD_Dog.

He is my good friends.He is very kind.but he is no more handsome than me.

  是不是很神奇呢?哈哈哈也没什么可神奇的。这里只是使用现有的一个简单模型做了一个示范,模型是从大量的训练数据中具象出来的,因此分析的结果好坏还要取决于你使用的模型。

我们再看一个英文分词的例子:

 

1

2

3

4

5

6

7

8

9

10

//devide words

    public static void Tokenize() throws IOException {

        InputStream is = new FileInputStream(“E:\\NLP_Practics\\models\\en-token.bin”);

        TokenizerModel model = new TokenizerModel(is);

        Tokenizer tokenizer = new TokenizerME(model);

        String tokens[] = tokenizer.tokenize(“Hi. How are you? This is Richard. Richard is still single. please help him find his girl”);

        for (String a : tokens)

            System.out.println(a);

        is.close();

    }

  运行结果:

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Hi

.

How

are

you

?

This

is

Richard

.

Richard

is

still

single

.

please

help

him

find

his

girl

  

 

 

 

完整测试代码:

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

package package01;

 

import opennlp.tools.sentdetect.SentenceDetectorME;

import opennlp.tools.sentdetect.SentenceModel;

import opennlp.tools.tokenize.Tokenizer;

import opennlp.tools.tokenize.TokenizerME;

import opennlp.tools.tokenize.TokenizerModel;

 

import java.io.FileInputStream;

import java.io.IOException;

import java.io.InputStream;

 

public class Test01 {

 

    //divide sentences

    public static void SentenceDetect() throws IOException {

        String paragraph = “Hi. How are you? This is JD_Dog. He is my good friends.He is very kind.but he is no more handsome than me. “;

        InputStream is = new FileInputStream(“E:\\NLP_Practics\\models\\en-sent.bin”);

        SentenceModel model = new SentenceModel(is);

        SentenceDetectorME sdetector = new SentenceDetectorME(model);

        String sentences[] = sdetector.sentDetect(paragraph);

        for (String single : sentences) {

            System.out.println(single);

        }

        is.close();

    }

 

    //devide words

    public static void Tokenize() throws IOException {

        InputStream is = new FileInputStream(“E:\\NLP_Practics\\models\\en-token.bin”);

        TokenizerModel model = new TokenizerModel(is);

        Tokenizer tokenizer = new TokenizerME(model);

        String tokens[] = tokenizer.tokenize(“Hi. How are you? This is Richard. Richard is still single. please help him find his girl”);

        for (String a : tokens)

            System.out.println(a);

        is.close();

    }

 

    public static void main(String[] args) throws IOException {

// Test01.SentenceDetect();

        Test01.Tokenize();

    }

 

}

  

 

https://github.com/godmaybelieve