Answers to Episode 4 ("What's 'normal', really?")

Well that was a raging success… not!

Apparently database normalization isn’t something that web developers find all that interesting. (But thanks to malikyte and xhtmlcoder for keeping the question from being a complete ghost town!)

That’s a shame, though — there are all sorts of pragmatic reasons behind good data design. To name just a few: properly designed tables often perform better than their de-normalized brethren, normalized data is much easier to aggregate successfully, and (most importantly) properly designed tables are much easier to understand.

That last one’s really the crux behind normalizing tables. Remember — computers don’t care if we write good code; when we write good code, it’s so that future developers won’t curse our names. Data normalization falls into the same future-proofing category.

Anyway, though: on to the answers. I’ll be brief, I promise.

  1. The FDA’s nutritional content database is — to my utter surprise — actually 3NF (everything has a primary key, and every piece of data appears to be atomic). I have some quibbles with a couple of the design choices, but they’re actually pretty minor. It’s pretty remarkable when you come across data this clean out of the box.
  2. Although the population demographic data is fairly well designed (and easy to munge into better forms), it doesn’t even achieve 1NF: records lack primary keys. This is usually the case with public data, and it stinks; it makes tracking changes from version to version extremely difficult.
  3. The SEC filings were a trick question. They’re in a XML dialect, so normal forms don’t apply.

    I think it’s important to notice how different formats change the way we can produce and consume data; the SEC data is a pretty good example of well done XML, but it would be pretty difficult trying to push this data into a database in any sort of structured way.

    If I wanted to build a site around this stuff, I’d likely use something like Berkeley DB for XML instead of a relational database.

  4. The gas price data, though crammed into a Excel sheet used more for presentation than data management, is actually in 3NF (if only because it’s pre-aggregated data). The data of the measurement is the primary key, and all columns are singularly dependent on the primary key (i.e. price is a function of date, and nothing more).
  5. The Juvenile Arrest Rate data is, like the gas prices, nominally 3NF data crammed into Excel.

Coming up…

Tune in tomorrow for a special super-difficult (I hope) challenge to keep us all occupied over the holidays.

As always, if youâ??ve got a question, puzzle, or challenge that you think would make a good question for this quiz, email me at jacob -at- jacobian.org.

Replay

Category: community Time: 2006-12-21 Views: 1
Tags:

Related post

  • Episode 4: What's "normal," really? 2006-12-14

    Sorry about the missed week, fellow puzzlers - real life, and all that - I'll try not to let it happen again. Anyway, let's get right to this week's question. A few weeks ago, I posted a scavenger hunt for public data (answers); today we'll return to

  • What is MVC, really? 2011-12-30

    As a serious programmer, how do you answer the question What is MVC? In my mind, MVC is sort of a nebulous topic - and because of that, if your audience is a learner, then you're free to describe it in general terms that are unlikely to be controvers

  • What are subdomains, really? 2015-02-09

    In studying domain-driven design (DDD), I've come across the concept of subdomain, but I think I don't get it yet. My first understanding of this was that a subdomain is a subset of the domain of the application. In other words, it's a partition of t

  • What does it really take to make an MMORPG/Where should I go from here? 2011-05-27

    Foreword: I am posting this on answers.onstartups and gamedev.stackexchange because the expertise from each area will likely contribute useful information. Also, I saw similar questions to this, but I still want advice. So I'll change this question t

  • What bots are really worth letting onto a site? 2014-06-13

    Having written a number of bots, and seen the massive amounts of random bots that happen to crawl a site, I am wondering as a webmaster, what bots are really worth letting onto a site? My first thought is that allowing bots onto the site can potentia

  • What is Normalized DC Current Gain (as oppossed to DC Current gain)? 2012-03-09

    I have a table that presents DC Current Gain for BC548B transistor: and a graph that presents Normalized DC Current Gain: Are those two things related? What does "normalized" mean? Why is the Y-axis labeled with numbers from 0.2 to 2 instead of

  • What does unregister really do? 2013-06-17

    Today I lost my iPhone and because of that I proceeded to unregister it from Apple as http://support.apple.com/kb/ht2526 indirectly suggests. What does it really mean to unregister a product? --------------Solutions------------- It removes it as bein

  • What is Lightroom *really* doing when I change a Camera Calibration? 2014-08-24

    What does LR / ACR really do when a user changes the Camera Calibration setting? Here's Adobe's statement (for LR4, but I didn't find a newer one and there's no reason for this to change): Lightroom uses two camera profiles for every camera model it

  • Using a response 'answer' to find what percentile it is within a column 2016-06-15

    The code currently asks the user for two separate numbers, and it divides them and the answer pops up in a message box. What I want to do next is use that answer and figure what percentile it is within column "T" in my workbook. If response = vb

  • Whats the normal Enterprise server configuration who use VMWARE 2009-12-17

    I am using vmware in my home and i have 8Gb RAM. I was thinking that in Practical or enterprise who hosts the VPS. How much is their configuration in terms of RAM and processor. And usually How many Virtual machines are installed or recommended on On

  • What are normal, tangent and binormal vectors and how are they used? 2013-03-20

    I would like to find out the following information: What are they? Example usage in game development (the area they are used in) About the following vector types: Normal Tangent Binormal A simple game development centric explanation would suffice. --

  • Answers to Episode 2 (Real-life regular expressions) 2006-11-28

    Yeah, I'm a little late getting these answers posted. Sorry! If you missed it, last week's challenge dealt with deciphering regular expressions and finding subtle bugs within 'em. As with last week, before getting to the actual answers please indulge

  • Reverse engineering: what is it really good for? 2012-08-15

    I have some innocent/beginner questions: What is reverse engineering good for? As a programmer, should I learn the art of reverse engineering? What are the benefits to a programmer who is experienced with it? --------------Solutions------------- What

  • What can I really do with percona toolkit's pt-table-sync? 2012-11-19

    I have been looking for a tool to synchronize tables from 2 different databases and found pt-table-sync. I read the documentation and got confused: they mostly use examples referring to a replicated environment, but I thought the whole point of repli

  • What would cause really sporadic packet loss, when signal strength is fine 2013-06-14

    I have a router connected to a modem. My intention is to connect to this router from an office that's about 200 feet away and through 2 walls. The router signal is not strong enough to get to the remote location, so I have put a range extender in bet

  • What is "normal" activity on the loopback (lo) network interface? 2013-06-23

    I'm trying to diagnose intermittent poor wifi performance on my laptop, which has a fresh installation of 13.04. I installed several network monitoring tools, including "slurm", which I'm now using to monitor both the wireless interface (wlan0)

  • What does "Normal Shutdown, Thank you for playing [preauth]" In SSH logs mean? 2013-12-04

    Recently, My SSH log summaries for my Ubuntu 12.04 servers in Logwatch have started showing entries for "11: Normal Shutdown, Thank you for playing [preauth]" along with the "11: Bye Bye [preauth]" and "11: disconnected by user&qu

  • If a computer has 22gb ssd that is "100% free" - what is it really doing? 2014-10-05

    I am a Windows newbie, and just ran into a computer with 22GB SSD drive and a 1TB normal hard drive. The SSD drive according to diskmgmt.msc says its space is 100% free. What does this mean? Also, if one wanted to dual boot Ubuntu on a machine like t

  • How can I seem not sketchy /what's "normal" when sharing code to company without giving them access to my server/github repo? 2016-06-11

    First I understand this question is similar to Hand over source code to customer and Freelancing - Share the source code? but they don't seem to answer my question. A private company allowed me to use their API to make REST API calls (using Basic Aut

iOS development

Android development

Python development

JAVA development

Development language

PHP development

Ruby development

search

Front-end development

Database

development tools

Open Platform

Javascript development

.NET development

cloud computing

server

Copyright (C) avrocks.com, All Rights Reserved.

processed in 2.628 (s). 13 q(s)