3 See the NOTICE file distributed with
this work
for additional information
4 regarding copyright ownership.
6 Licensed under the Apache License, Version 2.0 (the
"License");
7 you may not use
this file except in compliance with the License.
8 You may obtain a copy of the License at
12 Unless required by applicable law or agreed to in writing, software
13 distributed under the License is distributed on an
"AS IS" BASIS,
14 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15 See the License
for the specific language governing permissions and
16 limitations under the License.
23 Please email comments or questions to the
public Ensembl
24 developers list at <http:
26 Questions may also be sent to the Ensembl help desk at
31 # This package, originally distributed by CPAN, has been modified from
32 # its original version in order to be used by the ensembl project.
34 # 8 July 2002 - changed package name
37 package Bio::EnsEMBL::Utils::Cache;
41 $VERSION $Debug $STRUCT_SIZE $REF_SIZE
42 $BEFORE $AFTER $KEY $VALUE $BYTES $DIRTY
46 $Debug = 0; # set to 1
for summary, 2
for debug output
47 $STRUCT_SIZE = 240; # per cached elem bytes overhead, approximate
62 Tie::Cache - LRU Cache in Memory
67 tie %cache,
'Tie::Cache', 100, { Debug => 1 };
68 tie %cache2,
'Tie::Cache', { MaxCount => 100, MaxBytes => 50000 };
69 tie %cache3,
'Tie::Cache', 100, { Debug => 1 , WriteSync => 0};
71 # Options ##################################################################
73 # Debug => 0 - DEFAULT, no debugging output
74 # 1 - prints cache statistics upon destroying
75 # 2 - prints detailed debugging info
77 # MaxCount => Maximum entries in cache.
79 # MaxBytes => Maximum bytes taken in memory for cache based on approximate
80 # size of total cache structure in memory
82 # There is approximately 240 bytes used per key/value pair in the cache for
83 # the cache data structures, so a cache of 5000 entries would take
84 # at approximately 1.2M plus the size of the data being cached.
86 # MaxSize => Maximum size of each cache entry. Larger entries are not cached.
87 # This helps prevent much of the cache being flushed when
88 # you set an exceptionally large entry. Defaults to MaxBytes/10
90 # WriteSync => 1 - DEFAULT, write() when data is dirtied for
91 # TRUE CACHE (see below)
92 # 0 - write() dirty data as late as possible, when leaving
93 # cache, or when cache is being DESTROY'd
95 ############################################################################
97 # cache supports normal tied hash functions
98 $cache{1} = 2; # STORE
99 print
"$cache{1}\n"; # FETCH
102 while(($k, $v) = each %cache) { print
"$k: $v\n"; }
104 delete $cache{1}; # DELETE
109 This module implements a least recently used (LRU) cache in memory
110 through a tie interface. Any time data is stored in the tied hash,
111 that key/value pair has an entry time associated with it, and
112 as the cache fills up, those members of the cache that are
113 the oldest are removed to make room
for new entries.
115 So, the cache only
"remembers" the last written entries, up to the
116 size of the cache. This can be especially useful
if you access
117 great amounts of data, but only access a minority of the data a
118 majority of the time.
120 The implementation is a hash,
for quick lookups,
121 overlaying a doubly linked list
for quick insertion and deletion.
122 On a WinNT PII 300, writes to the hash were done at a rate
123 3100 per second, and reads from the hash at 6300 per second.
124 Work has been done to optimize refreshing cache entries that are
125 frequently read from, code like $cache{entry}, which moves the
126 entry to the end of the linked list internally.
131 my($class, $max_count, $options) = @_;
133 if(ref($max_count)) {
134 $options = $max_count;
135 $max_count = $options->{MaxCount};
138 unless($max_count || $options->{MaxBytes}) {
139 die(
'you must specify cache size with either MaxBytes or MaxCount');
142 my $sync = exists($options->{WriteSync}) ? $options->{WriteSync} : 1;
146 # how many items to cache
147 max_count=> $max_count,
150 max_bytes => $options->{MaxBytes},
152 # max size (in bytes) of an individual cache entry
153 max_size => $options->{MaxSize} || ($options->{MaxBytes} ? (int($options->{MaxBytes}/10) + 1) : 0),
155 #
class track, so know
if overridden subs should be used
157 'subclass' => $class ne
'Tie::Cache' ? 1 : 0,
175 dbg => $options->{Debug} || $Debug
180 if (($self->{max_bytes} && ! $self->{max_size})) {
181 die(
"MaxSize must be defined when MaxBytes is");
184 if($self->{max_bytes} and $self->{max_bytes} < 1000) {
185 die(
"cannot set MaxBytes to under 1000, each raw entry takes $STRUCT_SIZE bytes alone");
188 if($self->{max_size} && $self->{max_size} < 3) {
189 die(
"cannot set MaxSize to under 3 bytes, assuming error in config");
195 # override to write data leaving cache
197 # commented this section out for speed
198 # my($self, $key, $value) = @_;
202 # override to get data if not in cache, should return $value
203 # associated with $key
205 # commented this section out for speed
206 # my($self, $key) = @_;
211 my($self, $key) = @_;
213 my $node = $self->{nodes}{$key};
215 # refresh node's entry
216 $self->{hit}++; #
if $self->{dbg};
218 # we used to call delete then insert, but we streamlined code
219 if(my $after = $node->[$AFTER]) {
220 $self->{dbg} > 1 and $self->print(
"update() node $node to tail of list");
221 # reconnect the nodes
222 my $before = $after->[$BEFORE] = $node->[$BEFORE];
224 $before->[$AFTER] = $after;
226 $self->{head} = $after;
230 $self->{tail}[$AFTER] = $node;
231 $node->[$BEFORE] = $self->{tail};
232 $node->[$AFTER] = undef;
233 $self->{tail} = $node; # always
true after
this
235 # if there is nothing after node, then we are at the end already
236 # so don't do anything to move the nodes around
237 die(
"this node is the tail, so something's wrong")
238 unless($self->{tail} eq $node);
241 $self->print(
"FETCH [$key, $node->[$VALUE]]")
if ($self->{dbg} > 1);
244 # we have a cache miss here
245 $self->{miss}++; #
if $self->{dbg};
247 # its fine to always insert a node, even when we have an undef,
248 # because even if we aren't a sub-class, we should assume use
249 # that would then set the entry. This model works well with
250 # sub-classing and reads() that might want to return undef as
253 if ($self->{subclass}) {
254 $self->print(
"read() for key $key")
if $self->{dbg} > 1;
255 $value = $self->read($key);
260 if($self->{max_size}) {
261 # check max size of entry, that it not exceed max size
262 $length = &_get_data_length(\$key, \$value);
263 if($length > $self->{max_size}) {
264 $self->print(
"direct read() [$key, $value]")
if ($self->{dbg} > 1);
268 # if we get here, we should insert the new node
269 $node = &create_node($self, \$key, \$value, $length);
270 &insert($self, $node);
279 my($self, $key, $value) = @_;
282 $self->print(
"STORE [$key,$value]")
if ($self->{dbg} > 1);
284 # do not cache undefined values
285 defined($value) ||
return(undef);
287 # check max size of entry, that it not exceed max size
289 if($self->{max_size}) {
290 $length = &_get_data_length(\$key, \$value);
291 if($length > $self->{max_size}) {
292 if ($self->{subclass}) {
293 $self->print(
"direct write() [$key, $value]")
if ($self->{dbg} > 1);
294 $self->write($key, $value);
300 # do we have node already ?
301 if($self->{nodes}{$key}) {
302 $node = &
delete($self, $key);
303 # $node = &delete($self, $key);
304 # $node->[$VALUE] = $value;
305 # $node->[$BYTES] = $length || &_get_data_length(\$key, \$value);
309 $node = &create_node($self, \$key, \$value, $length);
310 # $node ||= &create_node($self, \$key, \$value, $length);
311 &insert($self, $node);
313 # if the data is sync'd call write now, otherwise defer the data
314 # writing, but mark it dirty so it can be cleanup up at the end
315 if ($self->{subclass}) {
317 $self->print(
"sync write() [$key, $value]")
if $self->{dbg} > 1;
318 $self->write($key, $value);
328 my($self, $key) = @_;
330 $self->print(
"DELETE $key")
if ($self->{dbg} > 1);
331 my $node = $self->delete($key);
332 $node ? $node->[$VALUE] : undef;
338 $self->print(
"CLEAR CACHE")
if ($self->{dbg} > 1);
340 if($self->{subclass}) {
341 my $flushed = $self->flush();
342 $self->print(
"FLUSH COUNT $flushed")
if ($self->{dbg} > 1);
346 while($node = $self->{head}) {
347 $self->delete($self->{head}[$KEY]);
354 my($self, $key) = @_;
355 exists $self->{nodes}{$key};
358 # firstkey / nextkey emulate keys() and each() behavior by
359 # taking a snapshot of all the nodes at firstkey, and
360 # iterating through the keys with nextkey
362 # this method therefore will only supports one each() / keys()
363 # happening during any given time.
368 $self->{
'keys'} = [];
369 my $node = $self->{head};
371 push(@{$self->{
'keys'}}, $node->[$KEY]);
372 $node = $node->[$AFTER];
375 shift @{$self->{
'keys'}};
379 my($self, $lastkey) = @_;
380 shift @{$self->{
'keys'}};
386 # if debugging, snapshot cache before clearing
388 if($self->{hit} || $self->{miss}) {
390 sprintf(
"%4.3f", $self->{hit} / ($self->{hit} + $self->{miss}));
392 $self->print($self->pretty_self());
393 if($self->{dbg} > 1) {
394 $self->print($self->pretty_chains());
398 $self->print(
"DESTROYING")
if $self->{dbg} > 1;
404 ####PERL##LRU##TIE##CACHE##PERL##LRU##TIE##CACHE##PERL##LRU##TIE##CACHE
406 ####PERL##LRU##TIE##CACHE##PERL##LRU##TIE##CACHE##PERL##LRU##TIE##CACHE
408 # we use scalar_refs for the data for speed
410 my($self, $key, $value, $length) = @_;
411 (defined($$key) && defined($$value))
412 || die(
"need more localized data than $$key and $$value");
414 # max_size always defined when max_bytes is
415 if (($self->{max_size})) {
416 $length = defined $length ? $length : &_get_data_length($key, $value)
421 # ORDER SPECIFIC, see top for NODE ARRAY STRUCT
422 my $node = [ $$key, $$value, $length ];
425 sub _get_data_length {
426 my($key, $value) = @_;
430 my @data = ($$key, $$value);
431 while(my $elem = shift @data) {
432 next
if $refs{$elem};
434 if(ref $elem && $elem =~ /(SCALAR|HASH|ARRAY)/) {
436 $length += $REF_SIZE; # guess, 16 bytes per ref, probably more
437 if (($type eq
'SCALAR')) {
438 $length += length($$elem);
439 } elsif (($type eq
'HASH')) {
440 while (my($k,$v) = each %$elem) {
445 $length += length($kv);
449 } elsif (($type eq
'ARRAY')) {
450 for my $val (@$elem){
454 $length += length($val);
459 $length += length($elem);
467 my($self, $new_node) = @_;
469 $new_node->[$AFTER] = 0;
470 $new_node->[$BEFORE] = $self->{tail};
471 $self->print(
"insert() [$new_node->[$KEY], $new_node->[$VALUE]]")
if ($self->{dbg} > 1);
473 $self->{nodes}{$new_node->[$KEY]} = $new_node;
477 $self->{bytes} += $new_node->[$BYTES] + $STRUCT_SIZE;
480 $self->{tail}[$AFTER] = $new_node;
482 $self->{head} = $new_node;
484 $self->{tail} = $new_node;
486 ## if we are too big now, remove head
487 while(($self->{max_count} && ($self->{count} > $self->{max_count})) ||
488 ($self->{max_bytes} && ($self->{bytes} > $self->{max_bytes})))
490 if($self->{dbg} > 1) {
491 $self->print(
"current/max: ".
492 "bytes ($self->{bytes}/$self->{max_bytes}) ".
493 "count ($self->{count}/$self->{max_count}) "
496 my $old_node = $self->delete($self->{head}[$KEY]);
497 if ($self->{subclass}) {
498 if($old_node->[$DIRTY]) {
499 $self->print(
"dirty write() [$old_node->[$KEY], $old_node->[$VALUE]]")
500 if ($self->{dbg} > 1);
501 $self->write($old_node->[$KEY], $old_node->[$VALUE]);
504 # if($self->{dbg} > 1) {
505 # $self->print("after delete - bytes $self->{bytes}; count $self->{count}");
513 my($self, $key) = @_;
514 my $node = $self->{nodes}{$key} ||
return;
515 # return unless $node;
517 $self->print(
"delete() [$key, $node->[$VALUE]]")
if ($self->{dbg} > 1);
519 my $before = $node->[$BEFORE];
520 my $after = $node->[$AFTER];
522 # my($before, $after) = $node->{before,after};
524 ($before->[$AFTER] = $after);
526 $self->{head} = $after;
530 ($after->[$BEFORE] = $before);
532 $self->{tail} = $before;
535 delete $self->{nodes}{$key};
536 $self->{bytes} -= ($node->[$BYTES] + $STRUCT_SIZE);
545 $self->print(
"FLUSH CACHE")
if ($self->{dbg} > 1);
547 my $node = $self->{head};
550 if($node->[$DIRTY]) {
551 $self->print(
"flush dirty write() [$node->[$KEY], $node->[$VALUE]]")
552 if ($self->{dbg} > 1);
553 $self->write($node->[$KEY], $node->[$VALUE]);
557 $node = $node->[$AFTER];
564 my($self, $msg) = @_;
565 print
"$self: $msg\n";
572 for(sort keys %{$self}) {
573 next unless defined $self->{$_};
574 push(@prints,
"$_=>$self->{$_}");
577 "{ " . join(
", ", @prints) .
" }";
583 my $k = $self->FIRSTKEY();
586 my($curr_node) = $self->{head};
588 $str .=
"[$curr_node->[$KEY],$curr_node->[$VALUE]]->";
589 $curr_node = $curr_node->[$AFTER];
593 $curr_node = $self->{tail};
595 $str .=
"[$curr_node->[$KEY],$curr_node->[$VALUE]]->";
596 $curr_node = $curr_node->[$BEFORE];
609 Tie::Cache installs easily
using the make or nmake commands as
610 shown below. Otherwise, just copy Cache.pm to $PERLLIB/site/Tie
617 * use nmake
for win32
618 ** you can also just copy Cache.pm to $perllib/Tie
622 There is another simpler LRU cache implementation in CPAN,
623 Tie::Cache::LRU, which has the same basic size limiting
624 functionality, and
for this functionality, the exact same
627 Through healthy competition, Michael G Schwern got
628 Tie::Cache::LRU mostly faster than Tie::Cache on reads & writes:
630 Cache Size 5000 Tie::Cache 0.17 Tie::Cache::LRU 0.21
631 10000 Writes 1.55 CPU sec 1.10 CPU sec
632 40000 Reads 1.82 CPU sec 1.58 CPU sec
633 10000 Deletes 0.55 CPU sec 0.59 CPU sec
635 Unless you are
using TRUE CACHE or MaxBytes functionality,
636 using Tie::Cache::LRU should be an easy replacement
for Tie::Cache.
640 To use
class as a true cache, which acts as the sole interface
641 for some data set, subclass the real cache off Tie::Cache,
642 with @ISA = qw( 'Tie::Cache' ) notation. Then override
643 the read() method for behavior when there is a cache miss,
644 and the write() method for behavior when the cache's data
647 When WriteSync is 1 or TRUE (DEFAULT), write() is called immediately
648 when data in the cache is modified. If set to 0, data that has
649 been modified in the cache gets written out when the entries are deleted or
650 during the DESTROY phase of the cache object, usually at the end of
653 To have the dirty data write() periodically while WriteSync is set to 0,
654 there is a flush() cache API call that will flush the dirty writes
655 in this way. Just call the flush() API like:
657 my $write_flush_count = tied(%cache)->flush();
659 The flush() API was added in the .17 release thanks to Rob Bloodgood.
661 =head1 TRUE CACHE EXAMPLE
665 # personalize the Tie::Cache object, by inheriting from it
667 @ISA = qw(Tie::Cache);
669 # override the read() and write() member functions
670 # these tell the cache what to do with a cache miss or flush
672 my($self, $key) = @_;
673 print
"cache miss for $key, read() data\n";
677 my($self, $key, $value) = @_;
678 print
"flushing [$key, $value] from cache, write() data\n";
681 my $cache_size = $ARGV[0] || 2;
682 my $num_to_cache = $ARGV[1] || 4;
683 my $Debug = $ARGV[2] || 1;
685 tie %cache,
'My::Cache', $cache_size, {Debug => $Debug};
687 # load the cache with new data, each through its contents,
688 # and then reload in reverse order.
689 for(1..$num_to_cache) { print
"read data $_: $cache{$_}\n" }
690 while(my($k, $v) = each %cache) { print
"each data $k: $v\n"; }
691 for(my $i=$num_to_cache; $i>0; $i--) { print
"read data $i: $cache{$i}\n"; }
693 # flush writes now, trivial use since will happen in DESTROY() anyway
694 tied(%cache)->flush();
696 # clear cache in 2 ways, write will flush out to disk
702 Many thanks to all those who helped me make
this module a reality,
705 :) Tom Hukins who provided me insight and motivation
for
706 finishing
this module.
707 :) Jamie McCarthy,
for trying to make Tie::Cache be all
709 :) Rob Fugina who knows how to
"TRULY CACHE".
710 :) Rob Bloodgood,
for the TRUE CACHE flush() API
714 Please send any questions or comments to Joshua Chamas
715 at chamas@alumni.stanford.org
719 Copyright (c) 1999-2002 Joshua Chamas, Chamas Enterprises Inc.
720 Sponsored by development on NodeWorks http:
722 All rights reserved. This program is free software;
723 you can redistribute it and/or modify it under the same
724 terms as Perl itself.