# A talk about the Halting Problem

Here’s a talk that I gave for high school students about the halting problem and its applications in math. Bonus feature: I ended way too early, so at the end I tell various weird stories about historical mathematicians.

# String indexing and compression with the FM-Index

I’m currently interning at Seven Bridges Genomics for a few weeks, which is awesome. They asked me to give a talk on the FM-Index–a highly compressed data structure that allows for efficient search queries. This is used for a fundamental step in the sequencing a genome, called alignment, in which you take a bunch of strings that are about 100 characters long that the chemistry spits out and try to fit them together into one long meaningful genome. (For more about alignment, check out this awesome blog post by Nate Meyvis.)

Some SBG people wanted me to share my slides, but the slides that I typically make are completely unsharable. So, I made a video, and now my lucky blog readers get the opportunity to listen to me talk for 40 minutes about the FM-Index! (You know… if you’re into that sort of thing…)

(You should probably watch it in 720p or 1080p because otherwise some of the text is annoyingly blurry for some reason.)

# A Little Security Flaw in Lots of Places

Hi, nerd blog. I’m not a person who cares much about privacy (which makes the fact that I’m currently a PhD student studying cryptography a little weird). But, there are some relatively basic levels of privacy that I think people are clearly entitled to.

For example, suppose that I want to know whether or not someone has an account on a certain website, and I don’t think he’d want to tell me. It’s not hard to imagine an example where that information could be harmful. The site might be for online dating or porn, or it might be somehow related to a counterculture, a protest movement, etc. The person might be someone that I’m angry at, a romantic interest, someone that I’m considering hiring, an ex, etc. Anyone who’s gone to middle school should recognize that people will certainly look for this information if they know where to find it.

It turns out that on many websites, you can do this–including extremely large websites. I’ll use economist.com as my first example because it’s a non-embarrassing website that I have an account on that has this flaw, as shown below.

Here’s what happens when I use my e-mail address and gibberish for a password:

So, you now all know that I have an account on this website, and if I want to know whether or not someone else does, all I need is her e-mail address.

Facebook also makes this mistake. Here are my Facebook privacy settings, in which I quite unambiguously say that you should need to be friends with one of my friends to find my Facebook account using my e-mail address:

On Facebook, this actually also works with phone numbers, and either way, they conveniently show an image of me and my name, in addition to verifying that I have an account.

Now, that’s just stupid, and it shouldn’t happen. The way to fix this is incredibly obvious: On a failed login, just say “Sorry, that e-mail/password combination is wrong.” instead of telling users specifically which credential is wrong. This has been accepted best practice for a long time, and it’s amazing to see such large companies screw it up. However, it’s hard not to roll your eyes because, well, most people would not be embarrassed if someone who already had their e-mail address also knew whether or not they subscribed to The Economist or have a Facebook account.

So, let’s pick something more sensitive. Say, for example, pornography. According to Wikipedia, Pornhub is one of the largest porn sites, and the 81st most popular website on the entire internet according to Alexa rankings (which likely undercounts such sites). It’s fairly obvious why someone wouldn’t want anyone with his e-mail address to know whether or not he has a Pornhub account. While Pornhub does not fall to the exact attack that I describe, it’s a nice example of a slightly different attack that accomplishes the same thing. Here’s what happens when you enter an invalid e-mail address into Pornhub’s “Forgot your password” page:

So, want to know whether or not someone has a Pornhub account? Go to the website, click “Forgot your  password,” and enter his e-mail address. (Of course, if he does have an account, he’ll get an e-mail. He might ignore it, but if he’s aware of this attack, he might recognize that someone did this to him. He won’t know who, though, and you’ll still have the information.)

In spite of the fact that this has been known for a long time, I’m willing to bet that the second attack works on the vast majority of sites on the web. I’ll spare more screenshots and naming-and-shaming, but a quick check shows that it works on lots of other porn sites, lots of dating sites, lots of online forums about sensitive topics, sites for companies devoted to web security, and, in general, just about every site I checked. (Sometimes, you have to use the “Forgot your username” link instead of “Forgot your password.”)

The solution? Obviously, just say “If you’ve entered the correct address, you will receive an e-mail shortly. If you don’t, please try a different address.” Fortunately, one site that screwed a lot of stuff up actually got this right: Healthcare.gov.

# The Bootstrap Species

Double disclaimer:

1. This is gonna be a rambling rant, as usual.
2. I believe in evolution, not intelligent design. But, part of what’s really cool about evolution is that it creates things that look intelligently designed. It’s therefore very convenient to talk about a species as though it was built to some deity’s specifications. So, I’m gonna make liberal use of that metaphor. Feel free to replace evocative, convenient, and technically incorrect phrases like “We were designed to survive” with the more boring, stilted, and accurate “We naturally happen to be ridiculously good at surviving because if we weren’t better at it than most of our innumerable competitors we wouldn’t exist” if you feel like it.

When I walk into a pharmacy (and, as a typical New Yorker, I buy a remarkable amount of my stuff at pharmacies), I’m often struck by the ridiculous number of things that we have to cure our ailments.

Mostly, it’s a reminder of the amazing variety of ways in which we break: Our bodies crack, leak, buckle, and bleed. Things grow on our bodies and in them. They eat us. Our organs stop functioning or function too quickly or too slowly. We itch. Even our brains–the things that are supposed to be us–often behave in ways that we wish they didn’t. Pharmacies have all sorts of things to cure many of these ailments and to make many more of them more tolerable.

Even stranger, we’re often troubled when things function exactly as they should. Our faces and bodies sprout hair; our nails grow long; sex leads to pregnancy. The pharmacy has solutions to all of these problems too–problems caused by a perfectly healthy body doing what it was designed to do.

There are even solutions to the problems caused by our solutions to other problems.

And, of course, things are pretty awesome as a result. We’ve slowly developed methods (some complicated but many amazingly simple) to tinker with these extremely complex machines that we live in–that we are–to get them to do what we want. We’ve built sprawling pharmacies, and we live long lives with amazing consistency. We’re happier and healthier; we even look better.

But, that is exactly what we’re doing: We’re tinkering with our bodies, and it’s fascinating. We’re now using the most sophisticated machines that we’ve ever encountered to do tasks for which they were never designed. And it’s working! More specifically, we pursue not just survival, not just reproduction, but happiness.

# Measuring the Difficulty of a Question (What Complexity Theory Is)

Suppose you’re given some clear question with a well-defined answer. Computer scientists often like to consider number-theoretic questions, such as “Does 19 divide 1547 evenly?”

It seems quite natural to try to assign to this problem some sort of measure of difficulty. For example, from an intuitive perspective, it’s easier to figure out whether 19 divides 77 than whether it divides 1547. And, figuring out whether two divides 1548 is totally trivial. And, of course, there are much harder problems, like counting all the prime numbers below 1548 .

Some problems’ difficulties take a second to figure out. For example, is it harder to figure out what the square root of 15,129 is or to count the squares less than 15,130? Intuitively, it might seem like the second one should be harder, but really, the answer to the second question is the same as the answer to the first–15,129 is 123^2, and there are 123 squares less than 15,130 (not counting zero), one for each positive integer less than or equal to 123. So the first problem is exactly as difficult as the second.

So, some problems’ difficulties are intuitively comparable. Some are still clearly comparable, but realizing that isn’t completely trivial. But, some problems just seem terribly unrelated. For example, is it easier to count the primes less than 78 or to tell whether 1547 is prime? Is it easier to count the squares less than 123,456 or count the primes less than 78? For questions like these, it seems likely that the actual answers actually depend on our definition of difficulty.

So, we need a rigorous definition of difficulty. The field that studies such definitions and categorizes problems according to their difficulty is called complexity theory, and it’s one of my favorite branches of mathematics. What follows is an NSD-style introduction to the field:

[toc]

# What’s Cancer?

There’s a nice laymen’s explanation for how most types of diseases work. Viruses, for example, are little pods that inject DNA or RNA into your cells, thus replicating themselves. (Here’s an awesome video by Robert Krulwich and David Bolinsky illustrating how a flu virus works.) Similarly, bacteria, fungi, and various other parasites are independent organisms–made of cells just like us–and its their process of going about their lives inside of us–eating various things (sometimes parts of us), excreting, releasing various chemicals, etc–that makes us sick (or gives us rashes or helps us digest food or boosts our immune system, etc). These various pathogens have evolved to kill. Autoimmune diseases are incredibly complicated (and often poorly understood), but the rough idea is clear: We have cells in our body that are designed to kill off viruses, bacteria, fungi, etc., and sometimes they screw up and kill the good guys instead of the bad guys.

All of those simplified descriptions make perfect sense, and as I’ve become more of a medical nerd, they’ve essentially held up to scrutiny. In other words, they’re good models.

However, the pop-culture description(s) of cancer never really satisfied me. In fact, I would argue that most descriptions of cancer actually ignore the basic question of what the hell this thing is and why it exists. Even my beloved Wikipedia neglects to just come out and say what the damn thing is in its main article about it, though its article on carcinogenesis is pretty good. That type of thing gets on my nerves–I don’t really understand how people can be comfortable talking about something without knowing what it is.

So, until recently, I knew embarrassingly little about cancer. Obviously, I understood that cells mutated and divided rapidly to form cancer, but what made them do that? And why are such a huge variety of things implicated in causing cancer? Radiation, cigarette smoke, HPV, and asbestos are very different things, so how can they all cause the same type of disease? What makes these evil mutant cells so damn deadly? What the hell is metastasis, and why does it happen? I did a decent amount of research and learned some stuff, but I still didn’t really understand what cancer actually is.

That changed thanks to the awesome NPR radio show/podcast Radiolab (which you should absolutely check out if you haven’t already–especially the archives). Their episode on cancer, “Famous Tumors,”finally provided me with a description of cancer that made any sense. It’s still the only decent layman’s explanation of what cancer actually is that I’ve been able to find (and I did a lot of searching while I was writing this blog post up). Once I understood the basics, I found that my other research on the subject actually made sense–even though almost all of it neglected to define the concept in question.

Anyway, you should probably just listen to “Famous Tumors” right now. Seriously. The part that led to my epiphany is a very brief segment that starts around minute eleven or so, and (as is Radiolab’s style), it’s described quite beautifully.

However, for those of you who’d rather read my ramblings, I’ll provide my own description below. In keeping with my tradition of verbosity, this post will take a while to actually mention cancer directly, but I swear the whole thing’s about cancer. Bear with me.

The story starts with a description of what we are:

[toc]

# A Better Idea than Signature by Squiggle

Like most of you, I sign stuff a lot.

Like many of you, I’m usually a bit embarrassed when I do it. My signature has devolved from a relatively legible cursive “Noah Stephens-Davidowitz” when I was in high school to my college “Noah S-D” to my post-graduation “N S-D” to my current series of four squiggles, which one might be persuaded are loosely derived from my initials together with a hyphen.

My business partner, Thomas, told me (I think only half-jokingly) that it was unacceptable for signing contracts with our clients. He accused me of just writing the number 900 lazily and sloppily. (I personally think it typically looks more like 907, but that day, I concede, it looked pretty 900ish.) Even people delivering food to my apartment, who only ask for my signature to protect themselves in the unlikely event that I later claim to have not received my food, have asked me to confirm that my signature is in fact a signature.

I would post an image of it for my readership to laugh at, but I suppose that that would be a bit of a security vulnerability.

But, isn’t that incredibly silly? I have in my mental possession four vaguely defined squiggles. For some reason, I’m forced to show people my squiggles all the time as some sort of confirmation that I, Noah Stephens-Davidowitz, am agreeing to something . And, it’s not just with delivery guys; I use my squiggles for extremely important interaction with governments, clients, my bank, etc. But, in spite of the fact that I show this thing all the time, I’m also keenly aware of the fact that posting it publicly on the internet is a terrible idea.

All of this is in case I at some point I say  “No, I never agreed to that.” Because of my signature, a slick lawyer could then confidently respond “But, if you didn’t agree to that, then why are these four squiggles here? Who but you could have squiggled four times on this piece of paper in such a way?”

Touche, slick lawyer.

# What Color Is

Disclaimer: I sorta figured this stuff out on my own. I did a tiny bit of research for this post, but I’m not very well-read at all on this particular field. So, there’s a decent chance I got something wrong. Oops… Please don’t take what I write here as gospel, and please let me know what I screwed up in the comments.

When I was a kid, I was taught some really confusing and vague things about color. There were the Official Colors of the Rainbow–red, orange, yellow, green, blue, indigo, violet (Roy G. Biv). Three colors were always called primary colors, but there seemed to be disagreement about whether the third one was yellow or green. (Yellow seemed like the obvious choice, though, since green’s just a mixture of blue and yellow. Right? How would you even make yellow from red, green, and blue?) There were various formulas for combining colors together to get other colors when I was painting in art class (E.g., red and green makes brown). But then there were different rules for mixing different colors of light together–like how white light is apparently a mixture of all other colors of light or something. There was also that weird line about why the sky was blue, which seemed to me to be equivalent to “The sky is blue because it’s blue.”

All of this left me very confused, with a lot of unanswered questions: Why isn’t brown a color of the rainbow? Sure, it’s a mixture of other colors, but so are green and orange and purple and whatever the hell indigo is. If orange isn’t a primary color, does that mean that everything that’s orange is really just red and yellow? And why is violet all the way on the other side of the rainbow from red even though it’s just a mixture of red with blue?

At the time, I was too shy to ask because people seemed so incredibly not bothered by all of this. I assumed I must have been missing something obvious.

As it happens, what I was missing isn’t at all obvious, but it’s also pretty easy to explain. (In particular, kids should be taught this.) All of this confusion goes away when you realize what color is and how our eyes and brains measure it. I’ll give a characteristically wordy explanation:

## Light, energy, etc.

A friend and I just accidentally invented a fun game. I like it because it cutely illustrates the current completely ridiculous state of our understanding of human nutrition. Namely, there’s an amazingly long list of things that people think about nutrition–such a long list that it’s just really really hard to imagine even a small fraction of these things actually being true.

(I have a lot more to say about human nutrition than just this. In short, I am extremely skeptical of any claims about human nutrition. I sort of touched on this in this old blog post on my poker blog, and I might do a bit more in the future.)

Anyway, here’s the game:

1. Choose something that people eat that some people think might have effects on your health (e.g. trans fats or calcium or acai berries).
2. Choose a word or phrase that is somehow related to human health (e.g. “asthma” or “concentration” or “bad for you”).
Four centuries ago, Newton showed that the gravitational field of a perfectly spherical object is equivalent to that of a point. In other words, if you pretend that the Earth is spherically symmetric (which is a really good approximation), then, if you want to know the net gravitational effect that comes from this massively complex system made up of about $10^{53}$ elementary particles–each one with its own mass creating its own gravitational field–it suffices to consider the gravitational effect of a single particle. (Admittedly, you need to consider a single particle that weighs 6,000,000,000,000,000,000,000,000 kilograms.) In other words, the gravitational field is simply given by $\frac{k}{r^2}$, where k is a constant and r is the distance from the center of the Earth. (Extending this model to include points within the Earth is easy as well.)