Diseño, Arquitectura e Implementación de una Base de Datos Multimedia (PHP, Zend, jQuery y MySQL)

Introducción

La Base de Datos Multimedia (en adelante BBDDMM) permitirá el almacenamiento, indexación, consulta y recuperación de los siguientes tipos de elementos multimedia:

– Imágen (jpg, png, gif)
– Vídeo (mpeg, avi, mp4, flv, mkv, ogg, …)
– Audio (mp3, wav, …)
– Flash (swf)
– Script (js, php, css, …)
– Documento (doc, pdf, odf, odt, …)
– Software (exe, msi, xpi, …)
– Ficheros (zip, rar, fichero genérico, …)

Cada uno de los tipos mencionados tiene sus particularidades, aunque todos ellos tienen también datos en común.

AUDIO

CREATE TABLE `cc_media_audio` (
`id` int(11) NOT NULL auto_increment,
`name` varchar(255) default NULL,
`path` varchar(255) default NULL,
`size` int(10) default NULL,
`type` varchar(255) default ”,
`mime` varchar(255) default ”,
`year` smallint(4) default NULL,
`artist` varchar(255) default ”,
`album` varchar(255) default ”,
`track` tinyint(2) default NULL,
`title` varchar(255) default ”,
`comment` text,
`tags` text,
`genre` varchar(255) default ”,
`copyright` varchar(255) default ”,
`layer` varchar(255) default ”,
`stereo` varchar(32) default ”,
`goodframes` int(6) default NULL,
`badframes` int(6) default NULL,
`frequency` int(6) default NULL,
`playtime` int(10) default NULL,
`created` datetime default NULL,
PRIMARY KEY  (`id`),
KEY `name` (`name`)
) ENGINE=MyISAM;

IDEAS PARA ALMACENAR GRANDES CANTIDADES DE FICHEROS E INDEXARLOS CON MYSQL

1. To clarify your last sentence, you’re recommending storing the hashed filename in the database and the file itself on the file system, correct? Also, I’d add as a disadvantage of storing all the files in one folder that if you end up with a very large number of files in that folder, listing the directory contents can become inordinately slow using traditional methods; see here

@Mitch: You’re correct, I would store the hashed filename in the database, not the file itself. Storing all the files in one folder could be a disadvantage, but we’d be talking millions before it was an issue. In that case, using dma_k’s strategy with mine would (theoretically) yield good results

I would go even further: calculate MD5 sum of each file before storing it to filesystem. You may use first two characters as the directory name of 1st level, next two characters as a directory of 2nd level:

vv 1st level
61f57fe906dffc16597b7e461e5fce6d.jpg
  ^^ 2nd level

As the a hashing algorithm has equal distribution, this will distribute your files equally among folders (the idea comes from how Squid organizes it’s file cache). The server should return URL like this (e.g. no notion about directories):

http://server.com/images/61f57fe906dffc16597b7e461e5fce6d.jpg

and you may apply mod_rewrite to actually rewrite this url to something like this:

/storage/images/61/f5/7fe906dffc16597b7e461e5fce6d.jpg

This will also add some degree anonymity and hide the real image name. More over, if your clients will intend to upload the same contents, it will end up in the same file, which will save your disc space. Beware when removing the file from one client: it may also be used by others!

Ver: http://www.olark.com/spw/2011/08/you-can-list-a-directory-with-8-million-files-but-not-with-ls/

2. Have a file server served by a HTTPD server (e.g. Apache HTTPD) for these static contents.

You can store into the DBMS file locations of the static contents instead of saving as a BLOB this way.

For redirecting traffic, you can store all images (even static files like CSS) in a separate server , like http://staticimg.yourdomain.com . With this way, your web server will be less busy serving download requests.

I would go even further: calculate MD5 sum of each file before storing it to filesystem. You may use first two characters as the directory name of 1st level, next two characters as a directory of 2nd level:

vv 1st level
61f57fe906dffc16597b7e461e5fce6d.jpg
  ^^2nd level

As the a hashing algorithm has equal distribution, this will distribute your files equally among folders (the idea comes from how Squid organizes it’s file cache). The server should return URL like this (e.g. no notion about directories):

http://server.com/images/61f57fe906dffc16597b7e461e5fce6d.jpg

and you may apply mod_rewrite to actually rewrite this url to something like this:

/storage/images/61/f5/7fe906dffc16597b7e461e5fce6d.jpg

This will also add some degree anonymity and hide the real image name. More over, if your clients will intend to upload the same contents, it will end up in the same file, which will save your disc space. Beware when removing the file from one client: it may also be used by others!

Responder

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s