about | contact | disclaimer | home   

S.KRAUSE

Spam, Spam, Eggs and Spam

Since last writing I finished the Susanna Clarke novel, I read Elizabeth Kostova's, and I am most of the way done with Marisha Pessl's. The semester is nearly at an end, the weather is chaotic, and writing about myself is a diversion. A digression.

I. Sites

I belong to far too many social networking sites, places like Friendster and MySpace and Facebook. They tend to lack taste or style, they are to a great extent populated by the woefully undereducated, by grammar and orthography deficient up-speakers, and by spammers, but they also have an undeniable usefulness.

Almost none of the people I care about from college or high school or even graduate school maintain their own personal websites; one or two have a c.v. online, at least those from graduate school, but I already know how to keep in touch with them—I see them several times a week. In fact, googling people I used to know often leads one back to this website; a mention here, a comment there, and suddenly I am the web's leading resource on persons A, B, and C. Pay-for-full-access sites along the lines of classmates.com and reunion.com have one major weakness: they cost money, and who wants to pay money?

This brings me back to Facebook, Friendster, MySpace, Orkut, and the rest—sites with free access. Andrew brought me to Friendster in 2003; I had been avoiding it, and I am a bit of an anti-hipster. Almost all those listed there as friends are friends from college—were I to keep in touch with them, it's how I would do so. I do not know when I got a MySpace account—perhaps it was long before the site became big, back when it was smaller than Friendster, before Facebook had any reputation—but sometime while I Berlin I decided to fill out my profile, and somehow it turned into a way to keep in touch with a few folks from high school, a couple from college, Madisonians with whom I too rarely meet up.

II. Spam

One of the downsides to said sites? The obvious: spam. Links to adult sites, strippers, mail-order-brides (or is that email-order?), etc.

Spam from Priscilla at MySpace

Foreward?

No. Nor ‘forward,’ and only barely comprehensible. But a lot like spam and a cheap attempt to get hits for an adult site ... yes, but not ‘foreward.’

Spam from Alex at MySpace

This one almost appeared genuine, for this one, as the one above, claimed to be local (Madison for the former, Milwaukee for the latter), and the latter claimed to be an English major at a local university. That itself—the incomprehensible English orthography—was enough to make me ask, even if you are for real, why in the hell would I want to look twice at such a dim-witted ditz?

Spam from Anna at Friendster

“Anna,” too, is unable to proofread a letter, but she was more obvious about her connection to an adult site or such than the others. webcma, indeed.

Something about spam fascinates me. It is unoriginal; the same phrases, ideas, approaches are used over and over, transferred from one medium to the next; and the fact that this continues says that there is at least some R.O.I. ... it is making somebody money, and that makes one ask, what sort of stupid person participates?

What sort of person falls for pyramid schemes, chain letters, confidence games, the more obvious of hoaxes and urban legends?

I have little (not the same as zero) sympathy for those who fall victim to phishing attacks and other forms of social engineering.

III. SQL

Analog and Digital ...

I store lots of data. My music is digitized, my photos come from a digital camera, I get spam rather than junk mail.

I have collected postcards, letters from family and friends, pictures from my childhood, and a shelves of books, but electronic correspondence and digital documents outweigh every aspect of my material information library. I have several database projects I wish to undertake if not complete, and archiving spam is not one of them.

They can be listed and summarized as follows: “documents,” email and similar correspondence, an address book, recipes, genealogy or family tree information, and bibliographic information.

A number of existent projects along these lines all disappoint me primarily because the simple ones have idiotic data models.

Let me explain. We have a number of common models and metaphors for databases. Index cards, the index of a book, a spreadsheet. These can, in fact, all be aspects of the broader model of how relational databases work: data is stored in tables that resemble spreadsheets (rows and columns), index tables are used in efficient referencing of frequently used material, and we all remember how a good library card catalogue system allows one to reference a variety of relations (author, subject, title).

But there are a few ground rules for modeling your data. Don't duplicate information. And try for one-to-one relationships. Many-to-many relationships are hard to model. Every entry (row) should have a unique identifier that does not change.

Most address book projects treat an address book entry as a single entity, when in fact it should consist of the merging of several entities. When I care about an address book entry, I want a name, an address, phone numbers (business, home, and cell), an email address or perhaps two, etc. A person can have multiple phone numbers, multiple email addresses, etc. A single address can belong to multiple people (family members, roommates, officemates in a business, etc.), and so on. A better (not necessarily the best) model for an address book should have a table for people, a table for addresses, and tables for the different types of contacts (phone, email, IM)—perhaps different tables for each, perhaps a contact table listing the account in one column, the type of account in another, or such. Then you have a table that says person ABC has address 123. Another to say person ABC has phone number 456, and so on. To display an address book entry, you might have a list of people in the address book; when you click on the person's name the database is queried for all the addresses, phone numbers, etc. that are tied to that person.

A recipe database would have recipes, as well as ingredients; there can be multiple ingredients in a recipe, and a single ingredient can be used in many different recipes. Any time you have a many-to-many sort of relationship, you need to have different tables. Sure, a flatfile might work to an extent in a limited way: your recipe is something HTML-esque (take your preferred, more rigid dialect of *ML) with several sections (ingredients, directions, notes), but what happens when you want to have a searchable recipe book? What happens when you want to find all the recipes that use cinnamon—do you really want to do a full text search through every recipe file?

Genealogy is both easier and harder, which is to say, it could be all just people, but how to model the relationships? I suppose there are a couple of decent ways to capture most relationships within a single table (a column for father and one for mother that points back to another entry in the table; to model aunts, uncles, cousins, siblings, etc., just query parents—siblings share parents, for an uncle or aunt find the sibling [see previous task] of a parent, etc.), but this only works for simple things, not considering adoption, step-parents, etc. Thus, one probably needs the people and a table for relationships that combines two people and the type of relationship (if you use one relationship table, or for multiple tables, something along the lines of ID, parent, child). And then you have events (birth, death, marriage, baptism, confirmation, divorce, etc.)—an individual can have multiple instances of many of these, so they likely deserve a separate table, but on top of that if one recalls that as a good researcher of family history one should also have documents (birth certificates, obituaries, etc.)—and these shouldn't be locked into the individual, either. Records can lie, names can be misspelled, and so on.

I wanted a movie database as well, perhaps just a list of things I have on DVD. A movie has multiple actors. Each actor can be in multiple movies. When it comes to having a movie, I can have it in multiple formats (DVD, VHS, etc.), so I probably want a movies table, a people table, and a table for what I have. The table for what I have lists the movie and the format, plus perhaps a comment. There could be a table that links movies with people, and also lists their role. What is nice about such a format is that it makes it trivial to search for information. Give me all the movies with FamousActor—click the name, and here they come. Click on the movie name, and see all the actors and roles, and so on.

As I said, this is trivial.

I am not claiming that the models presented here are the best such models, but when I think of some of the braindead examples of address books, movie databases, etc., that I see out there, I am torn between vomiting and crying. There are good solutions as well—frameworks and content management systems, for exmaple—that take proper, normalized data modeling into account, but they are overkill for my needs (I'm just one guy with a small movie and book collection sitting in a house typing at a keyboard ...).

IV. State Street

Tuesday has become dissertator-burrito-day in Madison, a gathering that brings together me, Di, and Felecia. It is always the same: Qdoba, vegetarian burrito, perhaps some chips, a glass of water, about 2pm (so Di can get off work or similar), and going until we've run out of conversation.

D&F recently got MySpace accounts, so I rearranged my friends to place them first (see: Part I: Sites). After the food I pulled out the laptop, connected to Steep & Brew's wireless access point, and browsed a bit. We viewed and laughed at the spam (see: Part II: Spam), and then made a move to YouTube for the amazing 70s Danish version of Apache, Boney M's 1978 Rasputin, Lori's 2006 Eurovision winner Hard Rock Halleluja (which I've been meaning to analyze in detail), and my favorite two recent videos: the Honda Element commercial, No Pinch, and the stop-motion-video masterpiece, Tony vs. Paul.

Thereafter I found myself walking down a damp and shimmering State Street toward the Fair Trade Coffeehouse for a cup of coffee and a blueberry fritter. My only complaint: crappy wireless access. It is the same all over campus on throughout the city—connections are dropped, bandwidth is meager, and you realize, this is no replacement for DSL or cable at this point.

People mosey up and down this street, they bike, they pause and smoke and beg and wave.

Lynn wandered by, saw me in the window, so walked in, and what should have been a five minute chat about the merits of various coffee liqueurs ended up an hour or more later focusing on teaching fifth semester German and the intellectual qualifications of students to read metaphorically.

Well, that was my focus, at least.

—December 12 2006