CS1101:Assignment 4

Strings
Assigned Tuesday, September 25, 2018
Due Thursday, October 4, 2018 at 11:59 PM

Make a pretty design; convert between Roman and Arabic numerals. Each problem has easier options and harder options.   The total number of points possible exceeds 100. 

1. A pretty design


(20 points) Write a function design(s) that takes a string s and displays it as follows (this shows the output when the argument s has the value 'Mississippi'.

M
MI
MIS
MISS
MISSI
MISSISS
MISSISSI
MISSISSIP
MISSISSIPP
MISSISSIPPI

OR




(30 points)  For some additional credit, have it do this instead.


M
MI
MIS
MISS
MISSI
MISSISS
MISSISSI
MISSISSIP
MISSISSIPP
MISSISSIPPI
 ISSISSIPPI
  SSISSIPPI
   SISSIPPI
    ISSIPPI
     SSIPPI
      SIPPI
       IPPI
        PPI
         PI
          I

Note.  Somewhat uncharacteristically, this function (in both versions) does not return a value, but actually prints something.

2.  Translate to and from Roman numerals.


Our system for writing numbers, often called Hindu-Arabic numerals, originated in India around the 8th century AD, was adopted in the Arab world, and from there spread to Europe in the 12th or 13th century.  It is a decimal, positional number system:  'decimal', because it is based on powers of 10, 'positional' because the value of a symbol depends upon its position in the string of symbols.  For example, in 2325, the '2' at the left represents two thousand, while the other 2 represents twenty.  This is in contrast to the system of Roman numerals, which is (mostly) a decimal additive system: The symbols M,D,C,L,X,V,I represent the values 1000,500,100,50,10,5 and 1 respectively.  It survives in the numbers of monarchs, popes and Super Bowls, on clock faces, and on the cornerstones of buildings.

In its simplest version, you would just take the string of numerals and add up the values associated to each symbol, so that CCCCXIDCM would represent 2011.  In practice, however, this is never done:  The symbols are ordered from the largest magnitude to the smallest, and we never have more than four occurrences of M,C,X,I, or more than one occurrence of D,L,V. An example is MDCCCCXXVII, on the bottom cornerstone in the figure:  This building was erected in 1927.

A variant, which is more commonly seen, is to use 'subtractive' notation.  Here 900,400,90,40,9 and 4 are represented by CM,CD,XC,XL,IX,IV.  This is illustrated in the top cornerstone in the figure: MCMXLVI represents 1946.  Here you will never have more than three consecutive occurrences of M,C,X or I.

In this series of exercises, you will write functions that convert between Hindu-Arabic numerals and Roman numerals.  There are several options, and the hardest options are pretty challenging.  Start with the simplest version of each problem and work your way up.  You will get credit for the hardest working version that you hand in.  There are some important hints at the bottom!

Hindu-Arabic to Roman (easier version)


(35 points) Write a function arabic_to_roman(n) that takes an integer as an argument and returns a string.  If the integer is negative, or 5000 or greater, the string that is returned should contain some sort of error message, like 'Out of Range'.  Otherwise, it should be the Roman numeral representation of n, without the subtractive notation, but with the conventions on order observed. 

The algorithm for doing this is to repeatedly divide n by 1000,500,100,50,10,5,1 in turn, replacing n by the remainder at each step.  The sequence of quotients gives you the number of M's, D's, C's, etc. in the translation to Roman numerals.  For example, if the argument is n=1927,  the sequence of divisions yields

Divisor
n
Remainder
Quotient
1000
1927
927
1
500
927
427
1
100
427
27
4
50
27
27
0
10
27
7
2
5
7
2
1
1
2
0
2

The result is thus 1 M, 1 D, 4 C, 2 X, 1 V, 2 I, just as on the cornerstone above.   Note that this problem requires integer division.

Hindu-Arabic to Roman (harder version)


(45 points) Now write a function with the same name that returns the Roman numeral representation with the subtractive notation:  So, whereas the previous version of the problem would give the output MDCCCCXXVII for 1927, the present version should give MCMXXVII.  This is just a little trickier:  You no longer need to divide by 500, 50 and 5, but when you divide by, say 100, you have to pay attention to whether the quotient is 9 or 4, and, if not, whether it is less than 5 or greater than or equal to 5.

Another strategy that you can try is to dig down a bit in the documentation for string methods, and look at the methods called find and replace.  You might be able to use these to search for patterns like DCCC in the strictly additive representation and replace it by CM.

Roman to Hindu-Arabic (easiest version)


(25 points) Write a function roman_to_arabic(s)  that takes a string  s of Roman numeral symbols and returns the corresponding integer.  In this version, you really just have to add up values of the symbols---no attention is paid to order.   Have the function return a special error code -1 if an illegal symbol (something other than M,D,C,L,X,V,I) occurs in s.

Roman to Hindu-Arabic (medium version)

(40 points) Rewrite the above function so that  the rules about ordering and the number of occurrences of each symbol are respected.  So, while in the previous version, the argument 'CCMXM' would result in 2210 being returned, this new version will return the error code -1 for this input. It will also return the error code for 'MMMMM'.

Roman to Hindu-Arabic (hard version)


(55 points) Same as above, but incorporate the subtractive notation.  So here MCMXXVII is legal, but MDCCCCXXVII is not.  (In the previous version, the reverse would be true.)


Helpful Hints


There is a risk in coding this that you wind up with long sequences of if..else statements.  You can get much more compact code if you include, right at the outset, a function that gives the value of each symbol (that is, a function that returns 1000 if the argument is 'M', 500 if the argument is 'D', etc.) and another function that computes the inverse of this (returns 'M' if the argument is 1000, etc.)  Each of these functions will have a single long sequence of if..else statements, but they make the code for the conversion functions much less repetitive.

Checking for correct syntax in the medium version of Roman to Hindu-Arabic can be a chore---you would have to look ahead and verify that the value associated to each character is not less than the value associated to the following character, make sure that you don't have 5 repetitions of an X or 2 repetitions of  an L, etc.  You can do this, but there is a slick alternative method if you also solved the conversion problem in the other direction.  Just compute the value using the easy version of Roman-to-Hindu-Arabic, then use your arabic_to_roman function to convert back to Roman numerals---the result should agree exactly with the string you started with.

In the hard version of Roman to Hindu-Arabic, the idea is to look ahead one character to see if a symbol with smaller value precedes a symbol with larger value (like X before C),  and adjust the computation of the value accordingly.  Getting this just right can be tricky. Another strategy is suggested above in the description of the harder version of Hindu-Arabic to Roman: You could use find and replace to hunt for occurrences of patterns like CM and IV and replace them by DCCC and IIII, then process the result using the solution for the simpler version.

By the way, my interpretation of the rules, both with and without subtractive notation, is quite strict:  If you're using subtractive notation, then DCCCC is not a permitted option. CIC (for 999) and CCCCC (for 500) are not allowed in either version of the notation. The result is that once you choose between using or not using the subtractive notation, there is only one correct way to encode an integer in Roman numerals.  Treatments of this programming problem that I have seen on the Web often do this incorrectly.