#! /usr/local/bin/perl

$limit = $ARGV[0] || ~ 0;

$t = times;

# Calculate a vector of skips, so that I never check multiples of the first n
# primes.  For each small prime calculate the skip list based on the skip list
# of the next samller prime.  Compare each candidate only to the odd multiples
# of the current prime.
@skip = $max = 2;
print "2\n";
for $prime ( 3, 5, 7, 11, 13, 17, 19, 23 ) {
    print "$prime\n";
    $max *= $prime;
    my $prev = $candidate = 0;
    my $i = $prime;
    my $prime2 = $prime << 1;
    my $nextoddmultiple = $prime + $prime2;
    my $max = $max + $i;
    $max = $limit if $max > $limit;
    my @preskip = @skip;
    @skip = ();
  SERIES: while() {
	for( @preskip ) {
	    $i += $_;
	    $nextoddmultiple += $prime2 while $i > $nextoddmultiple;
	    if( $i < $nextoddmultiple ) {
		$candidate ||= $i;
		push @skip, $i - $prev if $prev;
		$prev = $i;
	    }
	    last SERIES if $i > $max;
	}
    }
}

print "$candidate\n";
@primes = $candidate;

$endnormal = -1;
$end = $primesmin = $primesi = $delayskip = 0;
$nextprimeproduct = $max = $candidate * $candidate;
# Dataset for predictable non primes:
# 0: the prime this concerns (constant)
# 1: this prime multiplied by the next prime giving an as yet unreached product
# 2: the index of that next prime
# 3: this prime**3, limit where this starts failing
@prediction = [$candidate, $nextprimeproduct, 0, $candidate * $nextprimeproduct];

print STDERR 'step 1: ', -$t + ($t = times), "s\n";

while() {
    for( @skip ) {
	$candidate += $_;
	exit if $candidate > $limit;
	$delayskip += $_;
	if( $candidate < $nextprimeproduct ) {
	    # didn't reach the next predicted non-prime, so do modulus checking
	    $div = 0;
	    for( @primes[0..$endnormal] ) {
		if ( ($_->[1] += $delayskip) >= $_->[0] ) {
		    $_->[1] -= $_->[0] until $_->[1] < $_->[0];
		    $div = 1 unless $_->[1];
		}
	    }
	    unless( $div ) {
		print "$candidate\n";
		push @primes, $candidate;
	    }
	    $delayskip = 0;
	    next;
	}

	# remove first element, multiply it with next prime
	$x = shift @prediction;
	$x1 = $$x[1] = $$x[0] * $primes[++$$x[2]];
	if( $x1 > $max ) {
	    # multiplication surpassed next unused prime to the square, so push an element for that
	    $end++;
	    $max = $primes[$end] * $primes[$end];
	    push @prediction, [$primes[$end], $max, $end, $primes[$end] * $max];
	}

	if( !$$x[3] ) {
	    # We reached prime**3, so eliminate prime, activate it for normal sieve
	    $endnormal++;
	    $primes[$endnormal] = [$primes[$endnormal], ($candidate - $delayskip) % $primes[$endnormal]]
	} else {
	    if( $x1 >= $$x[3] ) {
		# We multiplied past prime**3, so reinsert that, instead of $x1
		$x1 = $$x[1] = $$x[3];
		$$x[3] = 0;
	    }
	    # reinsert element in order, treating @prediction as a B-tree, such that the smallest is first
	    $a = 0;
	    $z = @prediction;
	    $i = $z >> 1;
	    while() {
		if( $x1 < ${$prediction[$i]}[1] ) {
		    $z = $i;
		} else {
		    $a = ++$i;
		}
		last if $a == $z;
		$i = ($a + $z) >> 1;
	    }
	    splice @prediction, $i, 0, $x;
	}
	$nextprimeproduct = ${$prediction[0]}[1];
    }
}

END { print STDERR 'step 2: ', times - $t, "s\nsize: ", (`ps -osz -p $$`)[1] }

__END__

=head1 Schnelles Primzahlsieb

I<English translation L<below|fast_prime_sieve>>

Das berühmte Sieb des Eratosthenes hat einige Nachteile: es ist sehr
ineffizient, da es immer wieder die selben Werte durchstreicht.  Es braucht
Platz für so viele Zahlen wie man untersuchen möchte, und taugt nicht um
weiterzusuchen, wenn man mehr Zahlen möchte.

=head2 Eratosthenes umgekehrt

Darum invertiere ich es: ich schaue mir die Kandidaten einen nach dem anderen
an, und prüfe ob sie modulo einer Primzahl bis zur Quadratewurzel null sind,
d.h. durch diese teilbar.  Nur sonst habe ich eine weitere Primzahl gefunden.
Um das zu tun, muß ich alle Primzahlen speichern.  Um nicht jedes Mal den
Modulus berechnen zu müssen, führe ich für jeden einen Zähler, so daß der
Modulus aus einer Addition, ein oder zwei Vergleichen und ggf. einer
Substraktion besteht.

Wenn man Eratosthenes betrachtet, ist es offenbar, daß jeder zweite Kandidat
gerade ist.  Also überspringe ich die, indem ich jeweils zwei addiere.  Dann
ist jeder zweite verbleibende Kandidat durch drei teilbar.  Die kann ich auch
übrspringen, indem ich stattdessen abwechselnd zwei und vier addiere.  Wenn
man das weitertreibt, wird's komplizierter: die verbleibenden Vielfachen von
fünf sind 25, 35, 55, 65...

Wenn man diese Sprungserien berechnet, werden sie exponentiell lang, bevor sie
wiederholt werden können, selbst für eine kleine Anzahl von Primzahlen, deren
Vielfache ich vermeiden möchte.  Für Primzahlen bis 17 ist die Sprungliste
92159 Einträge lang, bis 19 schon 1658879 und bis 23 umwerfende 36495360.
Aber es bedeutet wesentlich weniger Kandidaten zu betrachten, und einige
Teilbarkeitsprüfungen weniger pro Kandidat.  Es ist die Sache also wert, wie
Zeitmessungen belegen.

=head2 Vorhersehbare nicht-Primzahlen

Da ich nie einen Kandidaten teste, der durch die ersten paar Primzahlen
teilbar ist, brauche ich auch nicht gegen jene Modulus zu testen.  Daher sind
auch die nächsten nicht-Primzahlen vorhersehbar: 29*29, 29*31, 29*37...,
31*31, 31*37..., 37*37...  Indem ich jeweils den nächsten jeder Serie
berechne, und die sortiere, muß ich jeden Kandidaten nur mit der kleinsten
davon vergleichen, um zu wissen, daß er nicht prim ist.  Dann multipliziere
ich den Kameraden mit der nächsten Primzahl und sortier ihn wieder ein.

Leider versagt das, wenn ich 29**3, 31**3, 37**3... überschreite.  Also muß
ich an jeder dieser Stellen die Primzahl aus der Vorhersageliste entfernen und
sie in die Moduluskontrollliste übernehmen.  Das Ergebnis ist, daß
Modulusprüfung nicht beim Quadrat jeder Primzahl anfängt, sondern erst bei der
dritten Potenz.  Und die Prüfungen gehen nicht mehr bis zur Quadratwurzel
jedes Kandidaten, sondern nur bis zur dritten.

=head2 Performanz

Die Tabelle zeigt, wie viel Rechenzeit gebraucht wird, um alle Primzahlen bis
zur jeweiligen Zehnerpotenz zu ermitteln.  Solch eine Obergrenze kannC< primes >als
Aufrufparameter mitgegeben werden.  Die Speichergrößen sind die, die einC< ps
-l >anzeigt, abzüglich der für ein nacktes Perl.

      1e5	0,5s	1,5Mb
      1e6	4s	12Mb
      1e7	57s	102Mb
      1e8	933s	617Mb

Trotz aller Tricks ist dieser Algorithmus immer noch O( I<n> log I<n> ).  Die
oben aufgelisteten Zeiten wurden mit Perl 5.8.5 auf einem 1,5GHz Pentium mit
1Gb erzielt.

Während die Zeit theoretisch eine unbegrenzte Ressource ist, ist der Speicher
das leider nicht.  Und auch der Speicherverbrauch ist immerhin O( log I<n> ).
Er könnte um einen Faktor reduziert werden, indem man Bitvektoren statt
Skalaren nähme.  Die Sprungserie paßt in einen ein-Byte-Vektor, und könnte
sogar komprimiert werden, da sie sich immer wieder einigermaßen wiederholt.
Die bislang unbenötigten Primzahlen (größer der dritten Wurzel des Kandidaten)
könnten auf Platte ausgespult werden, was eine immense Hauptspeicherersparnis
brächte.  Aber der Verbrauch kann nicht dauerhaft innerhalb einer vorgegebenen
Obergrenze gehalten werden :-(

Parallelisierung wäre möglich, wobei z.B. ein erster Thread den 1. + 2. +
3. Sprung vollführt, und danach den 4. + 5. + 6., ein zweiter Thread den 1.,
und danach den 2. + 3. + 4. und der letzte den 1. + 2., und danach den 3. +
4. + 5, usw.  Die Threads müßten aufeinander warten, wenn die Vorhersageliste
umgeordnet wird, oder wenn die letzte abgestimmte Primzahl hoch drei gefunden
wird.


=head1 Fast Prime Sieve

The famous Sieve of Eratosthenes has a few drawbacks: it is very inefficient,
barring the same values again and again.  It requires space for as many
numbers as you want to analyse, and can not be used to continue if you want
more numbers.

=head2 Inverting Eratosthenes

That's why I invert it: I look at candidates one by one, checking if they are
zero modulo any of the primes up to its square root, i.e. divisible by it.
Only otherwise have I found another prime.  To do this I must store each
encountered prime.  To save calculating a modulus at each step, I add a
counter to each, where modulus is an addition, one or two comparisons, and
maybe a substraction.

When looking at Eratosthenes, it becomes evident that every second candidate
is even.  So I skip them, by adding two at each step.  Then every other
remaining candidate is divisible by three.  I can skip those too, by
alternately adding two and four instead.  When driving this further, it
becomes more complicated: the remaining multiples of five to skip are 25, 35,
55, 65...

When calculating these series of skips, they become exponentially long, before
they can be repeated, even for a small number of prime multiples I want to
skip.  For primes up to 17 the skip list is 92159 entries long, up to 19
already 1658879, and up to 23 a whopping 36495360.  But it means looking at
far less candidates, and doing a few less divisibility tests for each
candidate.  So it's definitely worth it, as timing tests show!

=head2 Predictable Non-Primes

Since I never test a candidate divisible by the first few primes, I never need
to modulus test against those.  So the next non-primes are predictable: 29*29,
29*31, 29*37..., 31*31, 31*37..., 37*37...  By calculating the next of each
series and ordering them all, I must only look for the smallest of them to
know it's not a prime.  Then I multiply that by the next prime and reinsert it
into the prediction list.

This alas fails, when I pass 29**3, 31**3, 37**3...  So at each of those
stages I take the prime out of the prediction list into the modulus check
list.  Result is that modulus checking starts not at the square of a prime,
but only at the power of three.  And checks no loger go up to the square root
of each candidate, but only up to the 3rd root.

=head2 Performance

The L<table|performanz> shows how much CPU time it takes to obtain all primes
up to the given power of ten.  Such an upper bound can be passsed toC< primes
>as a command line parameter.  The memory sizes are those shown byC< ps -l
>less that for a naked Perl.

For all the tricks, this algorithm is still O( I<n> log I<n> ).  The times given
L<above|performanz> were obtained with Perl 5.8.5 on a 1.5GHz Pentium with
1Gb.

While time is theoretically an illimited resource, memory alas is not.  And
memory consumption is still O( log I<n> ).  It could be reduced by a factor,
when using bit vectors, instead of lists of scalars.  The skip list only needs
one byte values and could even be compressed, because it is fairly repetitive.
The as yet unneeded primes (greater than third root of the candidate) could be
spooled out to disk, which would tremendously save main memory.  But
consumption cannot be indefinitely kept within a given upper limit :-(

Parallelization would be possible, e.g. by having one thread perform the 1st +
2nd + 3rd skip, followed by the 4th + 5th + 6th skip, the next thread the 1st,
followed by the 2nd + 3rd + 4th skip, and the last the 1st + 2nd, followed by
the 3rd + 4th + 5th skip, etc.  The threads would all have to catch up, when
reordering the prediction list, and when reaching a prime greater than the last
coordinated prime to a power of three.

=head1 Author

Daniel Pfeiffer <occitan@esperanto.org>

=begin CPAN

=head1 README

B<Fast Prime Sieve>
bounded only by memory with Math::BigInt::Lite
Optimizations:
B< · >never test product of first I<n> primes
B< · >quick test products in I<p>**2 .. I<p>**3
B< · >modulus by addition up to 3rd root

=pod SCRIPT CATEGORIES

Educational/ComputerScience
Search