Wed 23 Apr 2008
For the storage of the element state will be sometimes used the bitmasks. For example, as state - read/unread comment to article for each user. Although it would be better for this case, to save the time of the last view of the entry, here can arise also other tasks, in which must be remembered the binary data for elements, that don't include the timefield.
You can use the M2M-connection between a user and object. But the drawback is essential resources for index and the number of entries, if there are many users and objects. Here it would make sense to use the bit mask, without this shortcoming.
The idea is that there is only one entry in the databasa for each user, and the number of bytes in this entry and the number of bits in the byte determine the state of certain element. Normally Django has no model, that supports the blob-elements in the database (binary data). That is why I have used the usual textfield, its one symbol determines the state for four elements. This means, the entry of 250 bytes gives us the information about the state of 1000 elements.
For example, there are 10,000 jokes (this method is appropriate, if there are many elements, it means rather, the articles are small :)
Let there are 100,000 users. Then our connection model looks like this:
A size of readed_flags is equal at most to 2500 bytes. (10.000/4)
With our solution we don't need to save all 2500 bytes, but only amount of bytes, that is the largest ID of the read article divided by 4.
So, we have 2 functions: is_readed(id) и set_readed(id).
is_readed(id) gives back the state (true/false) for the certain ID.
set_readed(id) set the state to "true".
After set_readed(), the changes shouldn't be forgotten to be saved in the database through save().
Within the function set_readed() I didn't put save(), so that it would be possible, to activate the flags for large amount of articles quickly and to save changes in the database all at once.
Here is the model code:
You can use the M2M-connection between a user and object. But the drawback is essential resources for index and the number of entries, if there are many users and objects. Here it would make sense to use the bit mask, without this shortcoming.
The idea is that there is only one entry in the databasa for each user, and the number of bytes in this entry and the number of bits in the byte determine the state of certain element. Normally Django has no model, that supports the blob-elements in the database (binary data). That is why I have used the usual textfield, its one symbol determines the state for four elements. This means, the entry of 250 bytes gives us the information about the state of 1000 elements.
For example, there are 10,000 jokes (this method is appropriate, if there are many elements, it means rather, the articles are small :)
Let there are 100,000 users. Then our connection model looks like this:
class Readed(models.Model): user = models.ForeignKey(User, related_name='readed') readed_flags = models.TextField(blank=True)Number of entries is equal to the number of users, that is 100,000
A size of readed_flags is equal at most to 2500 bytes. (10.000/4)
With our solution we don't need to save all 2500 bytes, but only amount of bytes, that is the largest ID of the read article divided by 4.
So, we have 2 functions: is_readed(id) и set_readed(id).
is_readed(id) gives back the state (true/false) for the certain ID.
set_readed(id) set the state to "true".
After set_readed(), the changes shouldn't be forgotten to be saved in the database through save().
Within the function set_readed() I didn't put save(), so that it would be possible, to activate the flags for large amount of articles quickly and to save changes in the database all at once.
Here is the model code:
READED_FLAG = [1,2,4,8,0]
class Readed(models.Model):
user = models.ForeignKey(User, related_name='readed')
readed_flags = models.TextField(blank=True)
def dec2hex(self,n):
return "%X" % n
def hex2dec(self,s):
return int(s, 16)
def is_readed(self,num):
if num < 1:
raise ValueError("Must be positiv and not null")
byte_position, bit_position = divmod(num-1,4)
notpresent = byte_position + 1 - len(self.readed_flags)
if notpresent:
return False
four_status = self.hex2dec(self.readed_flags[byte_position])
return (four_status & READED_FLAG[bit_position]) != READED_FLAG[4]
def set_readed(self,num):
if num < 1:
raise ValueError("Must be positiv and not null")
byte_position, bit_position = divmod(num-1,4)
notpresent = byte_position + 1 - len(self.readed_flags)
for i in xrange(notpresent):
self.readed_flags += "0"
if notpresent:
four_status = 0
else:
four_status = self.hex2dec(self.readed_flags[byte_position])
four_status_new = self.dec2hex(four_status|READED_FLAG[bit_position])
readed_flags_new = self.readed_flags[:byte_position]+ \
four_status_new+self.readed_flags[byte_position+1:]
self.readed_flags = readed_flags_new
The application of this method is much broader, than just substitution of the M2M-model. It depends on your imagination.
English
Deutsch
Русский

April 25th, 2008 at 1:32 p.m.
Отличное решение, взял на заметку
April 25th, 2008 at 3:56 p.m.
Небольшое язвительное замечание :)
Правилнее бы все это дело запихнуть в отделное поле (например унаследованное от django.models.fields.TextFields к примеру) и назвать BitMaskField()
April 25th, 2008 at 4:56 p.m.
Не сказал бы, "правильнее", это только другой стиль программирования, каждый выбирает решение, на свой вкус ;)
July 5th, 2008 at 7:28 p.m.
Решение интересное, только мне кажется проблемка тут зарыта. Основное, что будут читать пользователи - это как правило последние анекдоты, таким образом мы просто глупо будем дописывать к каждому пользователю 2500 нулей в начало, и с каждым новым анекдотом это число будет только расти.
Мне кажется было бы _чуть_ логичнее для такой системы хранить 2 числа типа так: "9483;200000A03C" Что значит 9483 нуля, потом 200...3С. Это и сэкономит место и скорость (БД не нужно будет передавать лишние тысячи байт на каждого пользователя. А чтобы избежать оверхэда на split(';') можно хранить это как "00009483 " и затем int(field[:8]) - количество нулей, а str(field[9:]) - переменная.
Сохранять разумеется как "%08d;%s" % (..)
Но решение, как я и сказал - интересное.