A Beginner's Note of CRF++

Thanks for Yandong's help and guidance, that I got some basic ideas about CRF (Conditional Random Filed) and how the CRF model looks like. The encoder of CRF++, crf_learn, could generate a model in text format with the '-t' option. Take the Japanese word segmentation demonstration (example/seg) as an example, the following is the model in text format:

ersion: 100
cost-factor: 1
maxid: 1386      /* the number of feature functions */
xsize: 1

B                /* the tag lists, in this case, we have two tags */
I

U00:%x[-2,0]     /* unigram feature templates */
U01:%x[-1,0]
U02:%x[0,0]
U03:%x[1,0]
U04:%x[2,0]
U05:%x[-2,0]/%x[-1,0]/%x[0,0]
U06:%x[-1,0]/%x[0,0]/%x[1,0]
U07:%x[0,0]/%x[1,0]/%x[2,0]
U08:%x[-1,0]/%x[0,0]
U09:%x[0,0]/%x[1,0]
B               /* bigram feature template */

0 B             /* bigram of the tags for C_{-1} and C_0,  */
                /* number of features are 2^(# of tags).   */

4 U00:_B-1      /* _B-1 is the starting of a sentence */
                /* _B+1 is the ending of a sentence   */

6 U00:_B-2      /* _B-2 is the pre-token of _B-1  */
                /* _B+2 is the post-token of _B+1 */

8 U00:
10 U00:、       /* feature function id, template id, and observation */
12 U00:〇       /* since we only have two tags, each entry could     */
14 U00:「       /* be expanded to 2 feature functions                */
20 U00:う
... ...
... ...
1382 U09:3/年
1384 U09:9/3

-0.0799963416235706     /* the weight for each feature function */
0.4346315510326526      /* the negative value indicates the     */
-0.1044728887459596     /* feature is rarely seen, and we have  */
-0.2501623206703318     /* 1386 weights in total.               */
... ...

9 thoughts on “A Beginner's Note of CRF++

  1. Hi Yong Sun,
    I'm using CRF++ code in my project and facing a lot of problem in understanding/generating template file for CRF++.
    I would be thankful if you can help me in this regard.
    (mail id: rathorekps@gmail.com )

  2. Hi Yong Sun,

    I recently posted this question on metaoptimize qa which is closely related to your blog post.

    I just wanted to let you know as you might have a clue!

    Cheers,
    Matteo

  3. Hi, Matteo, if your trained model is a text format, you could get the weights as I commented in this blog entry. While I'm not familiar with the binary format, I think you will need to read the source code to figure out the binary format.

  4. Hi, Yongsun. I tried to use the Bigram feature template. I just replaced 'U' with 'B'. But it didn't work. The single B is fine. I couldn't figure it out.
    Bigram template I tried is like this: B00:%x[-1,0]
    Thanks for any help.

  5. Hi, Lacey, does crf_learn prompt any error/warning message, when you add bigram features? I just tried 's/U/B/g' in example/seg/template, it still could work.

  6. Hi Yongsun iam using CRF++ for my research, Does the model follows zero index for the feature functions?

  7. Hi ..Im a newbie of CRF++, I really dont know how to get it started. Im doing title extraction . I already have 11 set of features, but Im confused on how to convert it CRF fomat using feature templates...Can anyone give me a link that csn help my problem. A step by step tutorial will be great...!!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

To submit your comment, click the image below where it asks you to...
Clickcha - The One-Click Captcha