mbuswell, Madeline Buswell

My main comment is that this lab took kind of a long time; it was the most time-consuming for me out of all the labs we have done so far. Shortening it a little would be nice. Also, sometimes the directions were not 100% clear.

Part 1

The first thing I did was search mine and my two sisters' names over time:

As expected, Olivia is the most popular, although I was surprised Audrey was more popular than me, Madeline. Perhaps it is because my name has so many alternate spellings.

Then I wanted to know how people write about men vs. women in fiction with respect to three traits: kindness, intelligence, and beauty. I used the dependency tool.

As (unfortunately) could have been predicted, women are only really written about as beautiful, but it was surprising that men were also highly prized for their handsomeness. So the next thing I did (which to be honest is probably what I should have done from the start) was use the dependency tool in combination with the wildcard tool to see which adjectives other than the three I looked at were used for men and women in fiction.

And it turned out that all people really care about for both men and women is their age.

Part 2

I chose to look at Machiavelli's "The Prince", from Project Gutenberg. I have never read it before, but I saw it on their list of most popular downloads and thought it would be interesting.

In the word cloud above, you can see that "prince" is (naturally) the most frequently-used word in the text, but that "men" and "Castruccio" are not far behind. In comparison to works of fiction like "Pride and Prejudice" or "Howl's Moving Castle" (which I did look at briefly before deciding to choose something else), words like "said" and "thought" aren't up there, since they aren't as necessary in this kind of work. And (somewhat obnoxiously), "Gutenberg" is quite large, indicating that perhaps there is a lot of filler from the website in the .txt file, and it might be better to do this with a more "pure" file.

Surprisingly, you can see in the image above (from the "Stream Graph" tool) that "Castruccio", while it is one of the most common words, only really appears in the last third or so of the text; it just appears there A LOT.

"Textual Arc" was a fun way to see the progression of words through the text, and their relationships to one another. Words like "Gutenberg" are on the outer rim of the circle, since (even though they appeared often) they were more isolated appearances than, say, "prince", which was right near the middle. I liked watching the cursor zoom around between them, especially at higher speeds. It reminded me of images of neurons firing in the brain.

I also found the "Bubbles" tool particularly fun. It showed words in bubbles changing size in proportion to one another as the analysis progressed through the book; so, for example, you can see that "Chapter" is initially very large, since the book begins with the table of contents, but then it decreases to the point where it isn't even visible halfway through. On its own, this tool is admittedly probably not very helpful in formal analysis, but it is a fun visual representation of the information expressed numerically in other views.

Part 3

Words which can be either positive or negative:

I'm not sure if this is what the question is asking for, but I looked at their list (via "Page Source" and the javascript), and "zealous" is positive, while "zealot" is negative, which seems odd, since the second is literally just the noun version of the first.

Also, it's not quite the same, but "winwin" is less positive than "win", which also doesn't make sense to me since I always thought the former was the stronger adjective.

Words with weights which seem wrong:

One word which is completely missing is "ideal"; it's one of my favorite positive adjectives.

"Flagship" has a level 2 positive connotation, which I don't get. I thought it just meant "first" (as in "flagship store"), and that's pretty neutral. I could not really know what it means, though.

They both agree and are correct:

A text I sent to my sister:

"And to be clear, it isn't a waste to change majors and have to take a lot of new courses. If you are learning something new you are getting your money's worth. The vast majority of my classes are ones I didn't need, or that I needed but then I didn't bc I switched majors"

Sentimood and the demo service both agreed that it was positive, which is true (I was trying to be encouraging). I wasn't sure they would, because to be honest I myself think it sounds a little harsh, although I wasn't trying to be.

A text I sent to a friend:

"Claudia and her roommates are having a party in her room tonight and you are invited to attend"

They both said it had no sentiment, which is true (I was trying to be deadpan on purpose). I thought the "party" might get them to say positive, though, especially Sentimood.

They both agree and are wrong:

A text sent to me by my sister in the same conversation as above:

"Like I just feel like I'm wasting money lol, because based on what I've found so far having a year of history won't help me to change courses"

They both said it was positive, when it is very clearly negative. It was her "lol" and "like"s that did it.

A text sent to me by my friend:

"I cannot stop listening to all too well (10 minute version)"

They both thought it was negative, in Sentimood's case because of the "stop", but she was being positive (she can't stop listening to it because she enjoys it).

They disagree:

Another text from my sister:

"Yeah I didn't really like the cruel prince either like I've never even finished it because I've just never been able to get through it"

Sentimood thought she was being positive, again because of the "like"s, but the demo got it right that she was being negative and saying she DIDN'T like something.

They also disagree on "Don't worry about it!" Sentimood thinks it is negative, because of the "worry", but the demo recognized that it was DON'T worry.

Part 4

I am going from English to French and back again, and I am using Google Translate and Bing.

"It's raining cats and dogs" works well in both Google and Bing: Google gives "it's raining buckets" in French, which makes sense, and Bing gives "it's raining cords", which is the actual equivalent French idiom (or at least, that's what my high school French teacher said). But either way, the system knows what it corresponds to in English no matter how many times you flip back and forth.

"Break a leg", however, is horrible in Google Translate, which translates it literally ("casser une jambe", to break a leg). Bing does much better, returning "Bonne chance" ("good luck").

Even too much for Bing, though, which up until now had been doing well with idioms, is "I'm pulling your leg", which in English we understand to mean "I'm joking". Both Google and Bing treat it literally and spit out "je te tire la jambe".

It seems like while Bing is slightly better than Google, they both need some work before they will be truly useful in practice. Bing is probably only better than Google by chance; it would be interesting to run a real experiment with a large quantity of idioms and see which one was actually better. And in fact, there is potentially a problem with them getting it right some of the time, since somebody could assume that this means they would get it right all of the time, and then be horribly embarrassed when they tried to speak.

Part 5

Experiment 1

I taught the program how to recognize me and my sister Audrey. It only took three photos of each of us, and it even recognized me with my mask up (perhaps indicating that I have very distinctive eyes).

Experiment 2

I taught the program how to tell how many fingers I was holding up (0-5). This took many more photos to work, 15 for each, but it did eventually succeed.