Cuttlepress

Blogging about Hacker School?

Day 37

For the first part of the day, I made the writeup post for the Twitter project.

I paired with Stephen on his Zulip bot, which generates text based on a given set of Zulip streams. We worked on cleaning the input of blocks of code, unencoding the HTML entities in the output, and making the input case-insensitive.

For my Hamlet project, I finished writing how to generate bigrams, added the function to tabulate the probability of the input string, and modularized my Metaphone code. I also put some thought into the question of how scoring works: players that type convincing phrases need to be rewarded accordingly. Scoring clearly needs to take the length of the text into account, because otherwise a single word from the corpus is going to score best. While this is the sort of thing that is best done with playtesting, here’s a first attempt at mapping a phrase’s probability value to a score (remember that the probabilities are as negative logarithms; that is, a lower number is a better score):

1
2
3
4
5
6
7
8
9
10
11
12
function calculateScore(text, value) {
  var score;
  if (text.length < 1) {
      score = 0;
  }
  // longer strings with smaller "values" are worth more points
  else {
      var avgValue = value/text.length //going to be somewhere between 0 and 11.2
      score = Math.round(10*(11.5 - avgValue)*text.length);     
  }
  return score;
}

I ran this on several input phrases, with and without Metaphoning everything. Output below the fold.

File read in: hamlet.txt
Text: whether tis nobler in the mind to suffer the slings and arrows of outrageous fortune
score: 1281
------------------
Text: whether tis nobler in the mend to sufer the slings and arrws of outrajs fortune
score: 900
------------------
Text: fourscore and seven years ago
score: 203
------------------
Text: strumpet
score: 9
------------------
Text: your orisons
score: 117
------------------
Text: 
score: 2
------------------
Text: lets all go to elsinore
score: 363
------------------
Text: lets all go to switzerland
score: 309
------------------
Text: lets all go to
score: 307
------------------
Text: to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be or not to be
score: 3887
------------------
Text: call me ahab
score: 208
------------------
Text: call me maybe
score: 208
------------------
Text: call me elsinore
score: 229
------------------
Text: call me fellow
score: 266
------------------
________Let's try it Metaphoned!________
Text: w0r ts nblr in 0 mnt t sfr 0 slnks ant arws of otrjs frtn
score: 1266
------------------
Text: w0r ts nblr in 0 mnt t sfr 0 slnks ant arws of otrjs frtn
score: 1266
------------------
Text: frskr ant sfn yrs ak
score: 214
------------------
Text: strmpt
score: 9
------------------
Text: yr orsns
score: 117
------------------
Text: 
score: 2
------------------
Text: lts al k t elsnr
score: 360
------------------
Text: lts al k t swtsrlnt
score: 309
------------------
Text: lts al k t
score: 307
------------------
Text: t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b or nt t b
score: 3909
------------------
Text: kl m ahb
score: 208
------------------
Text: kl m mb
score: 208
------------------
Text: kl m elsnr
score: 229
------------------
Text: kl m flw
score: 261
------------------

As you can see, simply looping “to be or not to be or not to be [etc]” is a clear exploit. (Credit to Rishi.)