UVA Problem 12592 – Slogan Learning of Princess Solution KMP Algorithm Implementation and Explanation

Click here to go to this problem in uva Online Judge.

Solving Technique:

Dissecting the problem statement:

The problem is straight forward. Given an input n, there will be 2n lines of input. Since each line also has accompanying next line. They are divided in first and second line. Here only first line is required for calculation. The second line is the output of the program.

Then we will be given say m lines. For each lines we are to match the first lines from previous 2n lines and output the second line.

Solution:

There are easier ways to solve this, but this mainly a KMP ( Knuth Morris Pratt ) algorithm tutorial.

Here the input is “jalo re jalo” which is one the inputs from the problem. The state diagrams are first drawn without following the algorithm. After that DFA table from hand tracing and code output is matched.


 

State Machine diagram without transition to initial state:

Explanation:

Why most states transition back to state 1 on ‘j’ input? Reaching state 1 means ‘j’ is already matched so going back to 0 will produce wrong output. Because that’d mean two j matched or “jj”. Since ‘a’ is after ‘j’ the next task is to match ‘a’ so we go back to state 1 and try to match further inputs.

Notice on state 8 on input of ‘j’ it goes to 9 instead of 1. Because that ‘j’ is also a part of the seqence and the seqence hasn’t broken yet. Finally on state 12 or on the length of the pattern string the input gets is accepted.

kmp dfa without initial transitions
kmp dfa without initial transitions

 

State Machine diagram with transition to initial state:

kmp dfa with initial transitions.png
kmp dfa with initial transitions.png

Calculated DFA table:

DFA table is generated with pattern being one of the input from the problem.
“jalo re jalo”

After generating the DFA table from tracing the algorithm and the DFA table state after running the algorithms both match. Also the states and transitions of the DFA table exactly match the state machine above.

Empty cells means it contains 0 which transitions to initial or start state. Transitions to start state not shown to keep it simple.
kmp calculated dfa table
kmp calculated dfa table

DFA table state after running of KMP algorithm:

Input is the same string as above.

kmp dfa table print
kmp dfa table print

Step by step table fill and explanation: (CreateDFA function)

Note empty cells in the matrix contains zero. They are not shown to keep it simple here but they are there and they transition to start or beginning state which is 0.

Call to createDFA Function:

This is the initial state of the DFA matrix.

\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & & & & & & & & \\ \hline  {\bf a} & & & & & & & & & & & & \\ \hline  {\bf e} & & & & & & & & & & & & \\ \hline  {\bf j} & & & & & & & & & & & & \\ \hline  {\bf l} & & & & & & & & & & & & \\ \hline  {\bf o} & & & & & & & & & & & & \\ \hline  {\bf r} & & & & & & & & & & & & \\ \hline  \end{tabular}


Setting up the next state:

This is execution of line 53 in code below.

DFA[pattern[0]][0] = 1;

Here the pattern is,


pattern = "jalo re jalo";

So, pattern[0] is character ‘j’.


DFA['j'][0] = 1;

This essentially means in the j-th row and 0-th column set 1.


Understanding the transitions:

To understand the how the transitions work column wise lay the pattern. So if the length of pattern is m then there are m columns in the matrix. In row wise layout the set of character in the pattern. So even if there are multiple occurrence of the same character it is only represented as a row once.

By careful inspection of the DFA matrix it can be observed that state change occur when the character in that row matches the column representation of that character.

The row index is represented by the ASCII value of the set of characters in pattern. The column are represented from 0 to m where m is length of the pattern.

\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & & & & & & & & \\ \hline  {\bf a} & & & & & & & & & & & & \\ \hline  {\bf e} & & & & & & & & & & & & \\ \hline  {\bf j} & 1 & & & & & & & & & & & \\ \hline  {\bf l} & & & & & & & & & & & & \\ \hline  {\bf o} & & & & & & & & & & & & \\ \hline  {\bf r} & & & & & & & & & & & & \\ \hline  \end{tabular}


Copying values from x-th column:

Execution of line 60 and 61 in code below.


for(k = 0; k < r; ++k)
DFA[k][j] = DFA[k][x];

This code basically copies values in x-th column to j-th column. Initially x = 0 and j = 1, So values from 0-th column( 0-th index ) are copied into 1-st column ( 1-st index ).

\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & & & & & & & & \\ \hline  {\bf a} & & & & & & & & & & & & \\ \hline  {\bf e} & & & & & & & & & & & & \\ \hline  {\bf j} & 1 & 1 & & & & & & & & & & \\ \hline  {\bf l} & & & & & & & & & & & & \\ \hline  {\bf o} & & & & & & & & & & & & \\ \hline  {\bf r} & & & & & & & & & & & & \\ \hline  \end{tabular}


Updating the next state and Column to to copy from:

Execution of line 66.

DFA[pattern[j]][j] = j + 1;

when, j = 1


DFA[pattern[1]][1] = 1 + 1;

since pattern[1] = ‘a’


DFA['a'][1] = 2;

This replaces the matching row and column matching character cell with value of next state which is the next column number.

Execution of line 71.


x = DFA[pattern[j]][x];

Update x the column number where to copy values from. This means if in the pattern recognition we go back in a previous state then we don’t need to recalculate. Since the transitions in that state are already calculated use those same transitions.

\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & & & & & & & & \\ \hline  {\bf a} & & 2 & & & & & & & & & & \\ \hline  {\bf e} & & & & & & & & & & & & \\ \hline  {\bf j} & 1 & 1 & & & & & & & & & & \\ \hline  {\bf l} & & & & & & & & & & & & \\ \hline  {\bf o} & & & & & & & & & & & & \\ \hline  {\bf r} & & & & & & & & & & & & \\ \hline  \end{tabular}


Skipping Some Steps:

After skipping some steps now value of j counter is 8.
So pattern[8] = j. Note here I’ve used two single quotes to represent the space character.

Here again values from 0-Th column are copied into the 8-Th column.
DFA[‘j’][8] is set to j + 1 = 9.
There was already a value there and it’ll be replaced.

Now the interesting part when updating x ( the column to copy values from )


x = DFA[pattern[j]][x]
x = DFA['j'][0]
x = 1

It already stated there which state to go for on which input.
So x becomes 1. Now in the next iteration values from 1-st column will be copied.

\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & 5 & & & 8 & & & & \\ \hline  {\bf a} & & 2 & & & & & & & & & & \\ \hline  {\bf e} & & & & & & & 7 & & & & & \\ \hline  {\bf j} & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 9 & & & \\ \hline  {\bf l} & & & 3 & & & & & & & & & \\ \hline  {\bf o} & & & & 4 & & & & & & & & \\ \hline  {\bf r} & & & & & & 6 & & & & & & \\ \hline  \end{tabular}


Final State of the DFA table or Matrix:

Similarly follwoing the procedure of copying x-th column, then updating matching character cell to next to next state or index and finally updating x to be the value at current character row and x th column we get this DFA table.
\begin{tabular}{|l|l|l|l|l|l|l|l|l|l|l|l|l|}  \hline  & j & a & l & o & ' ' & r & e & ' ' & j & a & l & o \\ \hline  & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 \\ \hline  {\bf ' '} & & & & & 5 & & & 8 & & & & \\ \hline  {\bf a} & & 2 & & & & & & & & 10 & & \\ \hline  {\bf e} & & & & & & & 7 & & & & & \\ \hline  {\bf j} & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 9 & 1 & 1 & 1 \\ \hline  {\bf l} & & & 3 & & & & & & & & 11 & \\ \hline  {\bf o} & & & & 4 & & & & & & & & 12 \\ \hline  {\bf r} & & & & & & 6 & & & & & & \\ \hline  \end{tabular}


DFAStringMatching Function:

Execution of line 34, 35. Here i seem to have calculated length of pattern twice. This not a problem but doing extra work.

After the DFA matrix is generated. Next apply the given query string to the matrix and see if by transition it is able reach the last state which is length of the DFA string.


Important:  Be sure to add or print a new line after each output unless otherwise specified. The outputs should match exactly because sometimes even a space character causes the answer to be marked as wrong answer. Please compile with c++ compiler as some of my codes are in c and some in c++.


Input:

3
My name is
Asif
what is kmp
string matching algorithm
jalo re jalo
agun jalo
2
jalo re jalo
what is kmp

 


Output:

agun jalo
string matching algorithm

Code:

/*
 * Author:   Quickgrid ( Asif Ahmed )
 * Site:     https://quickgrid.wordpress.com
 * Problem:  UVA 12592 Slogan Learning Princess
 * Solution: Knuth Morris Pratt ( KMP ) algorithm implementation
 */

#include<cstdio>
#include<iostream>
#include<string>

using namespace std;

static string slogan[100];
static string query;

#define r 160

/**
 * Space may be further minimized at the cost of running time
 */
static unsigned DFA[r][100];

unsigned DFAStringMatching(string text, string pattern){
    unsigned m = pattern.length();
    unsigned n = query.length();

    /**
     * i is incrementing by 1, meaning using text[i] whole input string can be traversed.
     * text[i] is a character. DFA[text[i]][j] means text[i] character column and j th row.
     * This position or cell contains the column for next transition.
     */
    unsigned i, j;
    for(i = 0, j = 0; i < n && j < m; ++i)
        j = DFA[text[i]][j];

    /**
     * If j equals m then all transition completed successfully
     * meaning the string are a match.
     */
    if(j == m)
        return 1;
    else
        return 0;
}

void CreateDFA(string pattern){
    unsigned m = pattern.length();

    /**
     * Set the first state to 1
     */
    DFA[pattern[0]][0] = 1;

    unsigned x, j, k;
    for(x = 0, j = 1; j < m; ++j){
        /**
         * Copy all values from x column to j column.
         */
        for(k = 0; k < r; ++k)
            DFA[k][j] = DFA[k][x];

        /**
         * Update position in table to the next transition.
         */
        DFA[pattern[j]][j] = j + 1;

        /**
         * Update the column from which to copy values.
         */
        x = DFA[pattern[j]][x];
    }


    /**
     * Uncomment code below to see transitions to states in DFA table
     * For printing transitions in DFA
     * Delete this before submitting to UVA
     * Empty or Zero in DFA means transition to initial state
     */

    /*
    int val = 0;
    for(j = 0; j < m; ++j){
        for(k = 0; k < r; ++k){
            val = DFA[k][j];
            if(val)
                printf("Transition from state (%d) to state (%d) for input: %c\n", j, DFA[k][j], k);
        }
        printf("\n");
    }
    */

}

int main()
{
    static unsigned n, i, q;

    scanf("%u", &n);
    getchar();

    /**
     * Since each input has two lines
     */
    static unsigned limit = 2 * n;

    for(i = 0; i < limit; i += 2){
        getline(cin, slogan[i]);
        getline(cin, slogan[i+1]);
    }

    scanf("%u", &q);
    getchar();

    while(q--){
        getline(cin, query);

        /**
         * Create the DFA table from input
         */
        CreateDFA(query);

        for(i = 0; i < limit; i += 2){
            /**
             * If the first line matches a String stored in memory
             * then print the next one
             */
            if(DFAStringMatching(slogan[i], query))
                cout << slogan[i+1] << "\n";
        }
    }

    return 0;
}
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s