If you will for a moment, please turn your attention the every present battle against tangled headphone cords. These zipper designs brought to you by designer Ji Woong, help take away the hassle and even add some spunk with quick zip technology. These headphones even come complete with remote control action on the zipper tab. It just goes to show you, sometimes the best solution is the easiest.
MySQL’s ENUM data-type is a hot spot that sometimes generates strong opinions among developers. At first glance it seems like an efficient solution for declaring a set of permitted values from which only one can be chosen for each record. A good example is a having a table of countries with a ‘continent’ column: every country belongs to a continent and the seven continents aren’t likely to change anytime soon. Yes, one day North America may collide with Asia to form Noramersia, but if your database somehow survives that long at least you won’t be around for the meeting on how to restructure your tables. That will be some new guy’s problem.
Now, If using ENUM were our only option for representing what continent a country belonged to, we could all move on to debating the merits of NoSQL or whether Git was better than SVN, or why your favorite framework sucks. But there is a superior option to ENUM in most cases: the lookup/joined/reference table. (We’ll call them reference tables.) It’s pretty simple:
Wikipedia describes a reference table as:
…a table into which an enumerated set of possible values of a certain field data type is divested. For example, in a relational database model of a warehouse the entity ‘Item’ may have a field called ‘status’ with a predefined set of values such as ‘sold’, ‘reserved’, ‘out of stock’. In a purely designed database these values would be divested into an extra entity or Reference Table called ‘status’ in order to achieve database normalization.
So, with reference tables available as an option to represent an enumeration, let’s look at why the ENUM data-type is evil:
Male/Female, Mr/Mrs/Ms, Africa/Asia/etc: these bits of text that people use ENUM columns for are data. When you use an ENUM column, you’re technically moving data from where it belongs (in actual database fields), to somewhere it doesn’t (into the database metadata, specifically a column definition). This is different than putting constraints on the data, which is what we are doing when we say that a numeric column can only hold an integer, or that a date column can’t be null - that’s fine and quite necessary. With an ENUM we’re actually storing pieces of data in a place that was only intended to hold crucial information about the model. In short, an ENUM column violates the rules of normalization. This may seem academic or pedantic, but it is actually why a lot of the other reasons on this list can be problems!
Invariably, what happens is this: you create an ENUM column and say “no way NEVER will this list change or need added to”. But humans are really poor at estimating the entire scope of something, and even worse at predicting the future. R&D dreams up a whole new product type. Your company adds another shipping method. North America crashes into Asia.
The problem is that changing the member list for an ENUM column restructures the entire table with ALTER TABLE, which can be very expensive on resources and time. If you have
ENUM('red', 'blue', 'black') but need to change it to
ENUM('red', 'blue', 'white'), MySQL needs to rebuild your table and look through every record to check for the now-invalid value ‘black’. MySQL is literally dumb and will even do this when all you did was add a new value to the end of the member list! (It is rumored that appending an ENUM member list will be handled better in the future, but I doubt that this is a high priority feature.)
A full-table rebuild may not cause much pain on a small table, but on a large one it is possible to peg your resources for a long time. If you use a reference table instead of ENUM, changing the list is as simple as INSERT, UPDATE, or DELETE, laughably-cheap operations by comparison. It’s also important to note that when altering an ENUM member list, MySQL converts any existing record values that are not included in the new ENUM definition to ‘ ’ (an empty string). With a reference table, you have greater flexibility when renaming or eliminating list choices (more on this below).
There just isn’t any saneway to add related information to an ENUM column, which is a common scenario that often comes up. In our country/continent example, what happens when we need to store something like land area for each continent? We didn’t envision needing that attribute, but now we do. With a reference table, we can simply extend the continent table to include a ‘land_area’ column and query this new data any way we would like. With an ENUM? Forget it.
Other awesome flexibilities exist due to the ability to easily extend a reference table. One common scenario is adding a column to set a flag to denote whether a choice in the reference table is discontinued. So, when your company stops selling black widgets, you can add an ‘is_discontinued’ column to the reference table and flag the old ‘black’ row. You can still query a list of currently offered colors, and maintain info about all your old orders of black widgets! Try that with an ENUM column.
A very common need is to populate a select-box or drop down list with possible values from the database. Like this:
If these values are stored in a reference table named ‘colors’, all you need is:
SELECT * FROM colors …which can then be parsed out to dynamically generate the drop down list. You can add or change the colors in the reference table, and your sexy order forms will automatically be updated. Awesome.
Now consider the evil ENUM: how do you extract the member list? You could query the ENUM column in your table for DISTINCT values but that will only return values that are actually used and present in the table, not necessarily all possible values. You can query INFORMATION_SCHEMA and parse them out of the query result with a scripting language, but that’s unnecessarily complicated. In fact, I don’t know of any elegant, purely SQL way to extract the member list of an ENUM column.
The usual justifications for using ENUM, are centered around optimization, in the conventional sense of performance gains, and sometimes in the sense of simplifying a complicated model to be more comprehensible.
Let’s look at performance. You can do a surprising number of un-optimized things with databases, but most won’t affect performance until a certain scale is reached, and often our applications are never asked to scale up that far. This is important to remember because DB devs should aspire to design fully-normalized and only de-normalize when a performance problem becomes real. If you’re concerned that a reference table is going to slow things down, benchmark it out both ways in your unique application on an actual dataset (or a realistic high-estimate fake dataset) and see. Just don’t automatically assume a join or a reference table is going to be a bottleneck, because it probably isn’t. (There is also evidence to support that ENUM isn’t always appreciably faster than alternatives.)
The second optimization argument for ENUM is that it reduces the number of tables and foreign keys in your database. This is a valid argument, in the sense that it’s one more little box joined to another box with some lines, and in large systems the effect of normalization can already tax the limits of human comprehension and complicate queries. This is however, why we make models, and why those models employ abstraction so we can understand them. Go ahead and draw up a new representation of your model or ER diagram that leaves out some of the little details and reference tables. Sometimes it may just seem easier to use an ENUM, but the fact that you think another reference table makes things too complicated isn’t a good reason by itself.
When you create a list of possible members in an ENUM column, there’s no easy and consistent way to re-use that list in other tables. With a reference table, the same set of data can be related to as many other tables as required. Changing the list in the lone reference table, will change the available options in every other table that it is linked or joined to.
With separate ENUM columns, you would have identical duplicate member lists on two different columns in at least two different tables (that would all require consistent updating).
Suppose you have ENUM(‘blue’, ‘black’, ‘red’) and you attempt to insert ‘purple’: MySQL actually truncates the illegal value to ‘ ’ (an empty string). This is correct, but if we had used a reference table with a foreign key, we would have more robust data integrity enforcement.
Also, MySQL stores enum values internally as integer keys to reference ENUM members. It’s easy to end up referencing the index instead of the value and vice-versa. Consider:
test (foobar ENUM(
Query OK, 2
affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
| foobar |
| 1 |
| 0 |
We inserted ‘1’ (a string), and accidentally also inserted 1 (as a number, without quotes). MySQL correctly (but confusingly) uses our number input as an internal reference to the first item in the member list (which is actually the value ‘0’).
The ENUM data type isn’t standard SQL, and beyond MySQL not many other DBMS’s have native support for it. PostgreSQL, MariaDB, and Drizzle (the latter two are forks of MySQL anyway), are the only three that I know of. Should you or someone else want to move your database to another system, someone is going to have to add more steps to the migration procedure to deal with all of your clever ENUMs. If it’s you, you’ll undoubtedly feel less clever than you once did - and if it’s someone else, they may not like you. Generally, migrating to a different database system is something that just doesn’t happen that often and everybody assumes will bring out demons anyway, which is why this just squeaks in at number 8 on the list.
A fairly good example that meets this criteria is our list of the continents. These are well-defined. Other commonly-given examples are salutations: Mr/Mrs/Ms, or playing card suits Spades/Hearts/Diamonds/Clubs. However, consider that even these examples have scenarios where you may need to extend the member list (such as when someone demands that you now need a ‘Dr.’ salutation, or when your card game app needs to accommodate a non-suited card like the Joker).
Consider again Spades/Hearts/Diamonds/Clubs. There are popular card games that rely on the fact that clubs/spades are black and hearts/diamonds are red (Euchre, for example.) What happens when we need to store additional info related to suit, such as its color? If we had used a reference table, it would be a trivial matter to add this color data to the reference table in an additional column. If we use an ENUM to represent suit, it becomes much more difficult to represent the color/suit model accurately, and we’re going to have to enforce it on the application level.
If you’re using an ENUM for only two values, you can always replace the ENUM with a very efficient TINYINT(1) or the even-better BIT(1) available since MySQL 5.0.3. For example:
gender ENUM('male', 'female') can be changed to:
is_male BIT(1). When you only have two choices, they can always be expressed as a Boolean true/false by prepending “is” to one of the member strings and renaming the column. As for less than 20: Yes, ENUM can store up to 65,535 values. No, you shouldn’t try it. More than 20 becomes unwieldy and certainly more than 50 is just insane to manage and work with.
There’s a reason ENUM is a string data type. Not only should you be using anumeric data type to store numbers, but ENUM has some well-documented gotchas related to the fact that MySQL references ENUM members internally using a numerical index. (See #7 above.) Just don’t ever store numbers in an ENUM data type, ok?
Strict mode will at least throw an error when you try to insert an invalid value into an ENUM column. Otherwise only a warning is thrown and the value is simply set to an empty string ‘ ’ (referenced internally as 0). Note: Errors can still be suppressed in strict mode if you use IGNORE.
Do what makes sense from a development/maintenance perspective, and optimize only once a performance problem becomes real - in most cases that is a strong argument for using reference tables over MySQL’s ENUM datatype.
There is no doubt that the grail of efficiency leads to abuse. Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.
Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. - Donald Knuth
Researchers Develop Most Efficient LEDs
Engineering researchers at the Univ. of Toronto have developed the world’s most efficient organic light-emitting diodes (OLEDs) on plastic. This result enables a flexible form factor, not to mention a less costly, alternative to traditional OLED manufacturing, which currently relies on rigid glass.
Read more: http://www.laboratoryequipment.com/news-Flexible-Organic-Light-Emitting-Diodes-on-Plastic-110211.aspx
Smart Chimp Gets Speech Like a Human
Panzee the chimp understands words including “tickle”, “lemonade” and “M&M”, which makes it sound like that chimp leads a pretty fun and delicious life. This helps refute the idea that humans are the only species able to process and produce speech - the “Speech is Special” theory.
A 25-year-old chimpanzee named “Panzee” has just demonstrated that speech perception is not a uniquely human trait.
Well-educated Panzee understands more than 130 English language words and even recognizes words in sine-wave form, a type of synthetic speech that reduces language to three whistle-like tones. This shows that she isn’t just responding to a particular person’s voice or emotions, but instead she is processing and perceiving speech as humans do.
(via Discovery News)
Hace algunos días me surgió la duda de cual es la diferencia entre Overload y Override cuando se trata de programar:
Overload es cuando se crean varias implementaciones para el mismo método, de tal manera que se pueda utilizar en distintos contextos, como por ejemplo:
function menor(int a, int b) … //Retorna el menor de dos números enteros
function menor(date a, date b)… //Retorna la menor de las fechas
Las dos funciones se llaman igual (menor), pero son distintas implementaciones.
Override lo utilizamos cuando estamos sobrescribiendo una función; en otras palabras cuando una subclase necesita implementar una función en específico, por ejemplo cuando estamos heredando una función toString() de una clase y queremos cambiar el comportamiento de dicha función, NO tenemos que ir a editar directamente la función en al clase padre, si no que solo le hacemos un override a la función toString().
Creo que esta situación nos ha tocado a muy pocos pasarla. Resulta que en el lugar donde laboro actualmente en algunas tablas se estaban utilizando llaves primarias de tipo CHAR(36); para ser exactos estabamos utilizando UUIDs; y por cuestiones de rendimiento y otros detalles (que no discutiré en este post) decidimos empezar a cambiar estos UUID por enteros. Seré sincero, es una tarea bastante engorrosa, pero al final logramos lo que necesitabamos. Los pasos que tuve que seguir son los siguientes:
SET @a = 0;
UPDATE tabla SET ID = @a:=@a+1;
Y listo! =)
Design is how it works.
- Steve Jobs
Dentro de mis labores diarias me topé con una incognita, ¿Cómo demonios puedo verificar si el ICCID de una SIM card, esta bien escrito? … bueno la respuesta es muy simple. Después de un rato de investigación me di cuenta que el algoritmo que utiliza el ICCID para calcular el digito de verificación o mejor conocido por su nombre en ingles: checksum digit; se llama algoritmo de Luhn, el cual les voy a explicar en simple y vulgar español.
Tomaremos como ejemplo el siguiente ICCID: 8950301211030316837 (el ultimo digito es el checksum digit); lo primero que tenemos que hacer es multiplicar por 2 un digito sí y un digito no, sin tomar en cuenta el 7 (de derecha a izquierda).
3 8 6 1 3 0 3 0 1 1 2 1 0 3 0 5 9 8
6 8 12 1 6 0 6 0 2 1 4 1 0 3 0 5 18 8
Luego sumamos los digitos de las cifras de dos digitos (si al multiplicar 6 x 2 nos da 12; entonces sumamos 1 + 2) y nos queda:
3 8 6 1 3 0 3 0 1 1 2 1 0 3 0 5 9 8
6 8 3 1 6 0 6 0 2 1 4 1 0 3 0 5 9 8 = 63
Luego solo sumamos todos los dígitos de la siguiente manera:
63 + el digto de verificacion = 63 + 7 = 70
Al final hacemos una división entera para obtener el resto (Mod):
70 mod 10 = 0
Y si el resultado es 0, entonces el el ICCID es correcto. Acá les dejo el código que utilicé en Visual Basic 2010:
Function verifyICCID(ByVal ICCID As String, ByVal checksumDigit As Byte) As Boolean Dim ICCIDArray(ICCID.Length - 1) As Integer Dim sumatoria As Integer = 0 For i As Integer = 0 To ICCID.Length - 1 ICCIDArray(i) = ICCID.Substring(ICCID.Length - (i + 1), 1) Next For i As Integer = 0 To ICCIDArray.GetUpperBound(0) Step 2 ICCIDArray(i) = ICCIDArray(i) * 2 Next For Each numero As Integer In ICCIDArray If numero < 10 Then sumatoria += numero ElseIf numero >= 10 Then Dim n As String = numero.ToString() sumatoria += Convert.ToInt16(n.Substring(0, 1)) + Convert.ToInt16(n.Substring(1, 1)) End If Next Return IIf((sumatoria + checksumDigit) Mod 10 = 0, True, False) End Function