UVA Problem 1368 – DNA Consensus String Solution

UVA Problem 1368 – DNA Consensus String Solution:


Click here to go to this problem in uva Online Judge.

Solving Technique:

Rank 174, Run time 12 ms ( 0.012 s ).

First input is number of test cases. Next of two inputs one is number of strings to input and another is the size of string. Our strings will only contain A, C, G, T characters.

For solving this problem we first need to find consensus string. That is of all strings we check each column of all strings and count the maximum occurring character in that column. This character is our consensus string character for that column.

Consensus string size is same as the input string size. Since each of its characters are maximum occurring characters in the input string. Example,

TATGATAC
TAAGCTAC
AAAGATCC
TGAGATAC
TAAGATGT
========
TAAGATAC    /* Consensus String, check each column for max character count */

Now that we have found the consensus string we just need to find consensus error. I will only explain the logic here, I have commented my code below.

TATGATAC
TAAGCTAC
AAAGATCC
TGAGATAC
TAAGATGT
========
TAAGATAC    /* Consensus String */
========
11101021    /* Consensus Error, 1+1+1+0+1+0+2+1 = 7, count every character of a column that don't match that consensus string character */

Now if you understood my explanation time to go write a better solution :).

Important:  Be sure to add or print a new line after each output unless otherwise specified. The outputs should match exactly because sometimes even a space character causes the answer to be marked as wrong answer.


Input:

3 
5 8 
TATGATAC 
TAAGCTAC 
AAAGATCC 
TGAGATAC 
TAAGATGT 
4 10 
ACGTACGTAC 
CCGTACGTAG 
GCGTACGTAT 
TCGTACGTAA 
6 10 
ATGTTACCAT 
AAGTTACGAT 
AACAAAGCAA 
AAGTTACCTT 
AAGTTACCAA 
TACTTACCAA

Output:

TAAGATAC 
7 
ACGTACGTAA 
6 
AAGTTACCAA 
12

Code:

/*
 * Author: Quickgrid ( Asif Ahmed )
 * Site: https://quickgrid.wordpress.com
 * Problem: UVA 1368 ( DNA Consensus String )
 */

#include<stdio.h>

int main(){
    register unsigned n, a, b, i, j, k, error;
    scanf("%u", &n);

    while(n--){
        scanf("%u%u", &a, &b);

        /**
         * Create the string array
         */
        char **s = new char*[a+1];
        for(i = 0; i < a; ++i){
            /**
             * Allocate space for each string
             */
            s[i] = new char[b+1];
            /**
             * Input a string in i th position
             */
            scanf("%s", s[i]);
        }

        /**
         * Consensus string declaration
         */
        char *conseq = new char[b+1];
        /**
         * K is used below
         */
        for(k = i = 0; i < b; ++i){
            /**
             * Reset A,C,G,T count array
             */
            unsigned int seq[4] = {0};
            for(j = 0; j < a; ++j){
                switch(s[j][i]){
                case 65:    /* 65, 'A' here seq[0] is A */
                    ++seq[0];
                    break;
                case 67:    /* 'C' */
                    ++seq[1];
                    break;
                case 71:    /* 'G' */
                    ++seq[2];
                    break;
                case 84:    /* 'T' */
                    ++seq[3];
                }
            }

            unsigned max = seq[0], maxIndex = 0;
            for(j = 1; j < 4; ++j){
                if(seq[j] > max){
                    /**
                     * Find the max count of A,C,G,T in a column, in all strings
                     */
                    max = seq[j];
                    /**
                     * Remember the maxIndex for creating consensus string
                     */
                    maxIndex = j;
                }
            }

            /**
             * We keep adding to our consensus string the most count of A,C,G,T
             */
            switch(maxIndex){
            case 0:
                conseq[k++] = 65;
                break;
            case 1:
                conseq[k++] = 67;
                break;
            case 2:
                conseq[k++] = 71;
                break;
            case 3:
                conseq[k++] = 84;
            }
        }

        for(error = i = 0; i < b; ++i){
            for(j = 0; j < a; ++j){
                /**
                 * Calculate the error count by moving column by column of all strings, then comparing against the consensus string character in that column
                 */
                if(s[j][i] != conseq[i])
                    ++error;
            }
        }

        /**
         * Print the consensus string and error count EACH ON A NEW LINE
         */
        printf("%s\n%u\n", conseq, error);
    }
    return 0;
}
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s