Post List

Mathematics posts

• February 17, 2012
• 10:46 AM
• 970 views

A Scientist’s Worst Nightmare

A detailed analysis of a new infographic dealing with the issue of falsification in scientific research.... Read more »

• February 6, 2012
• 04:00 AM
• 851 views

17 and sudoku clues [video] | @GrrlScientist

17 is the minimum number of clues required to give a unique sudoku solution -- but how did mathematicians prove this? ... Read more »

Gary McGuire, Bastian Tugemann, & Gilles Civario. (2012) There is no 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem. ArXiv. info:/arXiv:1201.0749v1

• February 2, 2012
• 10:35 AM
• 624 views

Numerical evidence for the square root of a Wiener process

Brownian motion is a very kind mathematical object being very keen to numerical simulations. There are a plenty of them for any platform and software so that one is able to check very rapidly the proper working of a given hypothesis. For these aims, I have found very helpful the demonstration site by Wolfram and [...]... Read more »

• January 31, 2012
• 06:04 AM
• 636 views

Quantum mechanics and stochastic processes: Revised paper posted

After having fixed the definition of the extended Itō integral, I have posted a revised version of my paper on arXiv (see here). The idea has been described here. A full account of this story is given here. The interesting aspect from a physical standpoint is the space that is fluctuating both for a Wiener [...]... Read more »

• January 31, 2012
• 03:29 AM
• 440 views

Voodoo Neuroscience Revisited

Two years ago, neuroscientists were shaken by the appearance of a draft paper showing that half of the published work in a particular field had fallen prey to a major statistical error.Originally called "Voodoo Correlations in Social Neuroscience", it ended up with the less snappy name of Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. I prefer the old title.The error in question is now known variously as the "circular analysis problem", "non-independence problem" or "double-dipping" although I still call it the "voodoo problem". In a nutshell it arises whenever you take a large set of data, search for data points which are statistically significantly different from some baseline (null hypothesis), and then go on to perform further statistics only on those significant data points.The problem is that when you picked out the statistically significant observations, you selected the data points that were especially "good", so if you then do some more analyses only on those data, you are almost guaranteed to find something "good". To avoid this you need to make sure that your second analysis is truly independent of your first one.Anyway, Vul and Pashler, the main authors of the original voodoo article, have just written a short piece in NeuroImage offering some reflections on the paper and the aftermath. They don't make any major new arguments but it's a good read. Particularly fun is their explanation of what inspired them to look into the voodoo problem:In early 2005 a speaker in our department reported that BOLD activity in a small region of the brain can account for the great majority of the variance in speed with which subjects walk out of the experiment several hours later (this ﬁnding was never published as far as we know). The implications of this result struck us as puzzling, to say the least: Are walking speeds really so reliable that most of their variability can be predicted? Does a focal cortical region determine walking speeds? Are walking speeds largely predetermined hours in advance? These implications all struck us as far-fetched...But they reveal that it was one paper in particular that set them off voodoo-hunting Our interest in probing the matter was further whetted by an episode occurring a short while later: Grill-Spector et al. (2006) reported that individual voxels in face selective regions have a variety of stable stimulus preferences; in a critical commentary, Baker et al. (2007) found that the analysis used to ascertain this fact implicitly built these conclusions into the method, such that the same analysis applied to noise data (voxels from the nasal cavity) revealed a similar variety of stable preferences. It occurred to us that a similar circularity might underlie the puzzlingly high correlations.To their credit, Grill-Spector et al quickly accepted Baker et al's criticism and admitted that some of their original conclusions had been wrong.Vul, E., and Pashler, H. (2012). Voodoo and circularity errors NeuroImage DOI: 10.1016/j.neuroimage.2012.01.027... Read more »

• January 26, 2012
• 12:07 PM
• 621 views

Groups in $\beta \mathbb{N}$

Lecture notes and video to a talk on an old result by Hindman and Pym on groups in the Stone-Cech compactification of the natural numbers.... Read more »

Hindman, N., & Pym, J. (1984) Free groups and semigroups in βN. Semigroup Forum, 30(1), 177-193. DOI: 10.1007/BF02573448

• January 25, 2012
• 05:22 AM
• 614 views

Quantum mechanics and the square root of Brownian motion

There is a very good reason why I was silent in the past days. The reason is that I was involved in one of the most difficult article to write down since I do research (and are more than twenty years now!).  This paper arose during a very successful collaboration with two colleagues of mine: [...]... Read more »

Farina, A., Giompapa, S., Graziano, A., Liburdi, A., Ravanelli, M., & Zirilli, F. (2011) Tartaglia-Pascal’s triangle: a historical perspective with applications. Signal, Image and Video Processing. DOI: 10.1007/s11760-011-0228-6

• January 24, 2012
• 02:08 AM
• 509 views

temperature aNOMalies

If you are new to climate science, you might be wondering what, exactly, this ‘temperature anomaly’ thing is that you keep hearing about. I know I was a bit confused at first! This post explains the concept, using a real-world example. Cities tend to be warmer than their surrounding countrysides, a fact known as the [...]... Read more »

• January 20, 2012
• 10:00 AM
• 685 views

Ian Stewart’s Mathematics of Life

This post is based on a book review I recently wrote on The Mathematics of Life, by Ian Stewart. A final version of the review will appear in a future issue of SIGACT News.  Please feel free to download a … Continue reading →... Read more »

Ian Stewart. (2011) The Mathematics of Life. Book: ISBN 0465022383. info:/

• January 17, 2012
• 09:35 AM
• 453 views

PCa and PCoA explained

Just before Christmas I was asked to talk to our molecular biologists about multivariate analyses. I was reminded of this on Thursday afternoon, when I saw that I had to talk to them on Friday. "Ah, no problem", I thought....... Read more »

Gower, J.C. (2005) Principal Coordinates Analysis. Encyclopedia of Biostatistics. info:/10.1002/0470011815.b2a13070

• January 16, 2012
• 09:51 AM
• 989 views

Is this journal for real?

This year 134 suspect new journals have appeared from the abyss, all published by the same clandestine company “Scientific & Academic Publishing, USA“. Scientists have been quick to raise the alarm and ruthless in their response.... Read more »

Morrison, Heather. (2012) Scholarly Communication in Crisis. Freedom for scholarship in the internet age. Simon Fraser University School of Communication. info:/

• January 15, 2012
• 10:53 AM
• 676 views

The beautiful (numbers) game…

In 2006 two nations took to the field in Berlin, Germany in front of a worldwide audience of 715 million people. Italy were to play France in the final of the FIFA World Cup. The match itself would later become famous for  that “head butt” by France’s Zinédine Zidane. But despite being eclipsed by a [...]... Read more »

• January 12, 2012
• 12:24 PM
• 633 views

Catch an "astrotweeter" with "Truthy"

A research group at the University of Indiana has developed a program called Truthy that allows anyone to track cases of "astroturfing" on twitter. Any search term can be entered into Truthy and the program will scan the Twitter API and build a model of how the search term originated. ... Read more »

Ratkiewicz,J. Conover,M. Meiss,M. Gonçalves,B. Patil,S. Flammini,A. Menczer, F. (2011) Truthy: Mapping the Spread of Astroturf in Microblog Streams. World Wide Web Conference Committee (IW3C2). . info:/

• January 12, 2012
• 08:00 AM
• 702 views

Error bars

Comparing averages should be one of the easiest kinds of information to show, but they are surprisingly tricky.Most people know that when they show an average, there should be an indication of how much smear there is in the data. It makes a huge difference to your interpretation of the information, particularly when glancing at the figure.For instance, I’m willing to bet most people looking at this...Would say, “Wow, the treatment is making a big difference compared to the control!”I’m likewise willing to bet most people looking at this (which plots the same averages)...Would say, “There’s so much overlap in the data, there’s might not be any real difference between the control and the treatments.”The problem is that error bars can represent at least three different measurements (Cumming et al. 2007).Standard deviationStandard errorConfidence intervalSadly, there is no convention for which of the three one should add to a graph. There is no graphical convention to distinguish these three values, either. Here’s a nice example of how different these three measures look (Figure 4 from Cumming et al. 2007), and how they change with sample size:I often see graphs with no indication of which of those three things the error bars are showing! And the moral of the story is: Identify your error bars! Put in the Y axis or in the caption for the graph.ReferenceCumming G, Fidler F, Vaux D 2007. Error bars in experimental biology The Journal of Cell Biology 177(1): 7-11. DOI: 10.1083/jcb.200611141A different problem with error bars is here.... Read more »

Cumming G, Fidler F, & Vaux D. (2007) Error bars in experimental biology. The Journal of Cell Biology, 177(1), 7-11. DOI: 10.1083/jcb.200611141

• January 4, 2012
• 12:34 AM
• 658 views

Why men don't listen and women are great at maths

• January 3, 2012
• 02:43 PM
• 857 views

Chimps Prefer the 2-Point Conversion

If non-human great apes were coaching more football games, you could expect to see fewer extra points being kicked. We risk-averse humans usually prefer kicking an easy extra point after a touchdown, rather than attempting a more difficult 2-point conversion. But chimps and other great apes, after considering their odds, usually opt for the greater risk and the bigger reward.

By "reward," I mean banana.

Researchers at the Max Planck Institute in Germany tested a group of chimpanzees, bonobos, gorillas and orangutans on their risk-taking strategies using chunks of banana. They wanted to know whether the apes' likelihood to go hunting for banana pieces hidden under cups, rather than taking a smaller banana piece already in front of them, depended on the "expected value" of their choices. Expected value is simply an item's worth, multiplied by your odds of getting it. If a 2-point conversion attempt is successful exactly half the time, then its expected value is 1 point.

The 22 apes each sat through a series of experiments involving banana bits in cups. On one side of a table, they saw a small piece of banana placed under a yellow cup. Next to that was a row of blue cups,  anywhere from one to four of them. Under one of the blue cups was a larger piece of banana.

The apes knew the larger piece of banana was hidden under one of the blue cups, but unless there was only one blue cup, they didn't know exactly where the banana was. (They understood the setup because there was also a series of trials in which the apes watched the banana being placed under one of the blue cups.) In each trial, an ape could point to just one cup and get the reward--if there was any--underneath.

The yellow cup was a guaranteed small reward. The blue cups were a gamble. And the size of the gamble (in other words, its expected value) depended on how many blue cups were on the table. It also depended on the difference in size between the two banana chunks. The "safe" piece of banana in the yellow cup ranged from one-sixth to two-thirds the size of the large piece.

The researchers found that the apes' decisions did correlate to the expected value of their options. Overall, as the expected value of picking a blue cup increased--there were fewer blue cups on the table, or the safe piece of banana was small and untempting--apes opted more often to try a blue cup. When the expected value of the gamble was lower--because there were a lot of blue cups to choose between, or the safe banana piece was large to begin with--they were more likely to stick with the yellow cup.

Adjusting choices based on the expected value of each option is similar to how humans would decide. But the apes were less human-like in their general propensity for risk. Even at the lowest possible expected values, apes chose to gamble on a blue cup more than 50% of the time.

In other words, apes acted more like humans playing the lottery than humans kicking an extra point after a touchdown. These apes, of course, didn't have their coaching jobs on the line. They might have just enjoyed playing the cup game. And in a human football game, there are plenty of situations in which a kicked extra point is better than going for 2--even though its expected value, with a success rate of about 50%, is the same.

But even outside of football games, humans are known by psychologists for being risk averse, especially when it comes to potential gains. We'd rather take a small guaranteed reward than a larger and riskier one. (For losses, though, we tend to feel the opposite way.)

When the researchers broke down their results by species, they found that while all four species were risk prone, bonobos were a little more conservative in their choices than chimps were. With only a small number of ape subjects, it's hard to draw any serious conclusions. But it's interesting to speculate about the differences between us and our two closest living relatives. Have chimps evolved to take more risks, always gambling on finding something better, because in the wild they must search for fresh fruit year-round? Can bonobos afford to be more conservative because their diet in the wild is more flexible? What factor in our past put risk-averse humans at an evolutionary advantage?

Next time your favorite football team takes an overly conservative extra point, don't blame the coach for his evolutionary history. You could always call up the owners, though, and suggest they hire a chimpanzee instead.

Photo: Flickr/Mat_the_W

Haun, D., Nawroth, C., & Call, J. (2011). Great Apes' Risk-Taking Strategies in a Decision Making Task PLoS ONE, 6 (12) DOI: 10.1371/journal.pone.0028801

• January 1, 2012
• 09:41 AM
• 697 views

Copyright vs Medicine: If this topic isn’t covered in your newspaper this weekend, get a new newspaper

According to the New England Journal of Medicine, after thirty years of silence, authors of a standard clinical psychiatric bedside test have issued take down orders of new medical research.... Read more »

Newman, J., & Feldman, R. (2011) Copyright and Open Access at the Bedside. New England Journal of Medicine, 365(26), 2447-2449. DOI: 10.1056/NEJMp1110652

• December 30, 2011
• 04:57 PM
• 1,062 views

The Myth of the 27 Club

A few months ago, I turned 27. Had I been a famous musician, I may well have dreaded this moment and gone into hibernation for a year, because 27 is the fabled age of the rock star death.

The member list of the ’27 Club’ – those musicians who met an untimely end at the age of 27 – reads like a Who’s Who of influential rock stars: Jimi Hendrix, Jim Morrison, Kurt Cobain, Janis Joplin, Brian Jones, and so the list goes on.

So why do so many musicians seem to crash and burn at the age of 27?... Read more »

• December 22, 2011
• 05:42 PM
• 760 views

Unruly beasts in the jungle of molecular modeling

The Journal of Computer-Aided Molecular Design is having a smorgasbord of accomplished modelers reflecting upon the state and future of modeling in drug discovery research and I would definitely recommend anyone - and especially experimentalists - interested in the role of modeling to take a look at the articles. Many of the articles are extremely thoughtful and balanced and take a hard look at the lack of rigorous studies and results in the field; if there was ever a need to make journal articles freely available it was for these kinds, and it's a pity they aren't. But here's one that is open access, and it's by some researchers from Simulations Inc. who talk about three beasts (or in the authors' words, "Lions and tigers and bears, oh my!") in the field that are either unsolved or ignored or both.1. Entropy: As they say, entropy, taxes and death (entropy) are the three constant things in life. In modeling both small molecules and proteins, entropy has always been the elephant in the room, blithely ignored in most simulations. At the beginning there was no entropy. Early modeling programs then started extracting a rough entropic penalty for freezing certain bonds in the molecule. While this approximated the loss of ligand entropy in binding, it did nothing to take care of the conformational entropy loss that resulted in the compression of a panoply of diverse conformations in solution to a single bound conformation. But we were just getting started. A very large part of the entropy of binding a ligand by a protein comes from the displacement of water molecules in the active site, essentially their liberation from being constrained prisoners of the protein to free-floating entities in the bulk. A significant advance in trying to take this factor into account was an approach that explicitly and dynamically calculated the enthalpy, entropy and therefore the free energy of bound waters in proteins. We have now reached the point where we can at least think of doing a reasonable calculation on such water molecules. But water molecules are often ill-localized in protein crystal structures because of low-resolution, inadequate refinement and other reasons. It's not easy to perform such calculations for arbitrary proteins without crystal structures.However, a large piece of the puzzle that's still missing is the entropy of the protein which is extremely difficult to calculate on many fronts. Firstly, the dynamics of the protein is often not captured by a static x-ray structure so any attempts to calculate protein entropy in the presence and absence of ligands would have to shake the protein around. Currently the favored process for doing this is molecular dynamics (MD) which suffers from its own problems, most notably the accuracy of what's under the hood- namely force fields. Secondly, even if we can calculate the total entropy changes, what we really need to know is how the entropy is distributed between various modes since only some of these modes are affected upon ligand binding. An example of the kind of situation in which such details would be important is the case of slow, tight-binding inhibitors illustrated in the paper. The example is of two different prostaglandin synthase inhibitors which demonstrate almost identical binding orientations in the crystal structure. Yet one is a weak binding inhibitor which dissociates rapidly and the other is slow, tight-binding. Only a dynamic treatment of entropy can explain such differences, and we are still quite far from being able to do this in the general case.2. Uncertainty: Out of all the hurdles facing the successful application and development of modeling in any field, this might be the most fundamental. To reiterate, almost every kind of modeling starts by using a training set of molecules for which the data is known and then proceeds to apply the results from this training set to a test set for which the results are unknown. Successful modeling hinges on the expectation that the data in the test set is sufficiently similar to that in the training set. But problems abound. For one thing, similarity is the eye of the beholder and what seems to be a reasonable criterion for assuming similarity may turn out to be irrelevant in the real world. Secondly, overfitting is a constant issue and results that look perfect for the training set can fail abysmally on the test set.But as the article notes, the problems go further and the devil's in the details. Modeling studies very rarely try to quantify the exact differences between the two sets and the error resulting from that difference. What's needed is an estimate of predictive uncertainty for single data points, something which is virtually non-existent. The article notes the seemingly obvious but often ignored fact when it says that "there must be something that distinguishes a new candidate compound from the molecules in the training set". This 'something' will often be a function of the data that was ignored when fitting the model to the training set. Outliers which were thrown out because they were...outliers might return with a vengeance in the form of a new set of compounds that are enriched in their particular properties which were ignored.But more fundamentally, the very nature of the model used to fit the training set may be severely compromised. In its simplest incarnation for instance, linear regression may be used to fit data points to a set of relationships that are inherently non-linear. In addition, descriptors (such as molecular properties supposedly related to biological activity) may not be independent. As the paper notes, "The tools are inadequate when the model is non-linear or the descriptors are correlated, and one of these conditions always holds when drug responses and biological activity are involved". This problem penetrates into every level of drug discovery modeling, from basic molecular level QSAR to higher-level clinical or toxicological modeling. Only a judicious and high-quality application of statistics, constant validation, and a willingness to wait (for publication, press releases etc.) before the entire analysis is available will preclude erroneous results from seeing the light of day.3. Data Curation: This is an issue that should be of enormous interest to not just modelers but to all kinds of chemical and biological scientists concerned about information accuracy. The well-known principle of Garbage-In Garbage Out (GIGO) is at work here. The bottom line is that there is an enormous amount of chemical data on the internet that is flawed. For instance there are cases where incorrect structures were inferred from correct names of compounds:"The structure of gallamine triethiodide is a good illustrative example where many major databases ended up containing the same mistaken datum. Until mid-2011, anyone relying on an internet search would have erroneously concluded that gallamine triethiodide is a tribasic amine. The error resulted from mis-parsing the common name at some point as meaning that the co... Read more »

• December 21, 2011
• 09:16 PM
• 501 views

The 27 Club: Are Famous 27-Year-Old Musicians at Risk?

When Amy Winehouse’s death was reported in July of 2011, conspiracy theorists immediately declared that her talent and her age, 27, had doomed her to being yet another member of the “27 club”, a club composed of famous musicians who all … Continue reading →... Read more »