Note: Original by Mithryn, and republished here with permission.
The critics make an assumption that Melania Trump must have heard Michelle Obama's speech
People conclude that Mrs. Trump heard Mrs. Obama's speech while in school (without any actual evidence that Melania ever actually heard the speech). They base this conclusion upon the assumption that the speech was widely available, and therefore Melania must have heard it. Hence, they conclude that Melania constructed her speech by using structural elements of Michele Obama's speech. The evidence is presented as a series of comparisons between the Mrs. Trump's speech and Michelle Obama's speech.
When we analyze or compare writing, often what seems intuitive or logical isn't
This happens because we don't normally compare speeches in this way - we only do it to answer specific kinds of questions. And since most of us never really do it at all, we have no idea what we should really expect. So we don't usually know how to evaluate this kind of thing. This is why there needs to be an explanation, and why, after the findings were challenged, it didn't move forward towards a formal publication.
The basic premise behind this sort of study is an act of comparison
We compare a whole bunch of things, and see which are the most similar. This can be a bit misleading.
If one takes a box full of forks, and then toss in a spoon, one can compare all the forks to see which one is most like the spoon.
But will this make the spoon a fork or the fork a spoon?
The funny thing about this study is that if we removed Mrs. Obama's Speech from the list of potential sources, the model would still kick out another Speech that was most like Melania Trump's speech, and if we removed that one, we would still get another, and so on. What this modeling cannot tell us is how alike they really are. To create a kind of visual image, the way we would often deal with this is to create a space around the speech (draw a circle or a sphere around it), and place that at an outside limit for what we might think would illustrate a connection. And if the closest speech falls within that circle, then we would look for more information. But if it was outside, we would conclude that there was no likely connection. But without such a mechanism (and there was no mechanism in this study), we will always get a closest speech - no matter what our options are.
The youtube clip makers employ a fallacy that is called the Texas Marksman (or the Texas Bulls Eye)
In trying to make the best argument, they give us these lists of similarities. In presenting this list, we get presented with a fallacy that is called the Texas Marksman (or the Texas Bulls Eye). Essentially, the way the reference works is that you shoot a bunch of rounds into the side of your barn, and then you go up to the holes and paint your target around them (giving you the best and tightest clustering). Usually, the way these models work in accepted applications is that you start by testing the model in situations where you already know the outcome. That way, you can see how reliable your new model is. And if it is highly reliable in known cases, then you can start cautiously applying it to unknown models (you don't create your own target this way). By intuiting that it must be right, this model used with Mrs. Obama's speech simply skipped the testing part. But this created one of the biggest obvious problems with the theory.
They didn't stop with Melania Trump's speech. They ran a test on a Rick Astley Music Video, and found a source (a relatively unknown video from the 1980's). Why is this important?
Rick Astley was a prolific song artist, being used in thousands of internet posts prior to the the convention.
We have a huge body of literature devoted to dealing with his video (his was one of the most important music videos of the period). So when you have a statistical model that produces a brand new source, not noticed by anyone previously, not mentioned in any of the youtube clips, and so on - there ought to be a bit of a red flag raised. But there wasn't. Had this theory been introduced to academic literary theorists - this would have been the major point of dispute (since they don't really care about Melania Trump's speech).
Did this model really find a previously unknown and unidentified source of Rick Astley's work?
Or did it simply create the illusion of doing this by painting a bulls eye after clustering its data?
I am pretty confident it was the second option here. (As a side note, discovering a new source for Rick Astley would be a thesis significant sort of discovery).
Computer modeling tends to get rid of boundaries, so it doesn't help us visualize the data density
Finally, computer modeling tends to get rid of boundaries. That is, we can take this whole pile of material, and it looks important, but it doesn't help us visualize the data. For example, those various parallels are placed into the end of the speech, which only represents a page or a page and a half in the text. How densely the material gets used does matter. So we have this list of phrases.
If we take all the four word sequences in the Michelle Obama's speech and look at Melania Trump's Speech, Trump uses a four word sequence from Obama less than one for every 400 different four word phrases.
Where do the other 399 phrases come from? At some point, we are going to find a bunch that occur simply by chance. In this case, there will be more connected through Political language. But if we take each one and start to compare the content, do we find them to be similar enough to make such claims? It is likely that we don't.
Compare here: http://en.fairmormon.org/Book_of_Mormon/Plagiarism_accusations/The_Late_War
When is plagairism so obvious every one can see it? Why can't FAIRMormon see it when it's way more frequent, and statistically more significant than when it happens in a single speech for a few lines and the entire world immediately gets it?