Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › High-precision program that calculates 2^n
New Posts  All Forums:Forum Nav:

High-precision program that calculates 2^n - Page 2

post #11 of 14
http://en.wikipedia.org/wiki/Long_double

Wikipedia seems to disagree. Huh?
post #12 of 14
Quote:
Originally Posted by 3930K View Post

http://en.wikipedia.org/wiki/Long_double
Wikipedia seems to disagree. Huh?

long double != unsigned long double
You can't use unsigned. Unsigned floating points literally do not exist smile.gif
Edited by tompsonn - 11/3/12 at 8:09am
My System
(30 items)
 
"Zeus"
(13 items)
 
 
CPUMotherboardGraphicsRAM
Intel Core i5 2500K (4.5ghz @ 1.320v) Gigabyte Z68X-UD3R-B3 MSI R7970 Lightning Corsair 16GB (4x4GB) 
Hard DriveHard DriveHard DriveHard Drive
Plextor PX-256M5S 256GB Crucial M4 128GB Hitachi HDS721010CLA332 Hitachi HDS723020BLA642 
Hard DriveHard DriveHard DriveOptical Drive
Hitachi HDS723020BLA642 Hitachi HUA722010CLA330 WDC WD10EARS-00Z5B1 TSSTcorp CDDVDW SH-S223B 
CoolingCoolingOSMonitor
Phanteks PH-TC14PE with TY-140's Lamptron FCv5 (x2) Windows 7 Ultimate 64-bit Dell U2412M 
MonitorMonitorMonitorKeyboard
Dell U2412M Dell U2212HM Dell U2212HM Ducky DK9087 G2 Pro 
PowerCaseMouseMouse Pad
Corsair AX-750 Corsair Obsidian 650D Microsoft IntelliMouse Optical  XTRAC Ripper XXL 
AudioAudioAudioAudio
Westone W3 IEMs RE-272 IEMs Shure SE-215 IEMs Schiit Bifrost DAC 
AudioAudio
Schiit Asgard 2 amp HiVi Swan M50W 2.1 
CPUMotherboardGraphicsRAM
Intel Core i7 950 GA-X58-UD3R Radeon HD 5450  24GB Corsair @ 1333mhz 
Hard DriveOSPowerCase
4x WD Cavair Red 1TB in RAID 0 Windows Server 2008 R2 x64 Corsair HX-520 LianLi LanCool 
  hide details  
Reply
My System
(30 items)
 
"Zeus"
(13 items)
 
 
CPUMotherboardGraphicsRAM
Intel Core i5 2500K (4.5ghz @ 1.320v) Gigabyte Z68X-UD3R-B3 MSI R7970 Lightning Corsair 16GB (4x4GB) 
Hard DriveHard DriveHard DriveHard Drive
Plextor PX-256M5S 256GB Crucial M4 128GB Hitachi HDS721010CLA332 Hitachi HDS723020BLA642 
Hard DriveHard DriveHard DriveOptical Drive
Hitachi HDS723020BLA642 Hitachi HUA722010CLA330 WDC WD10EARS-00Z5B1 TSSTcorp CDDVDW SH-S223B 
CoolingCoolingOSMonitor
Phanteks PH-TC14PE with TY-140's Lamptron FCv5 (x2) Windows 7 Ultimate 64-bit Dell U2412M 
MonitorMonitorMonitorKeyboard
Dell U2412M Dell U2212HM Dell U2212HM Ducky DK9087 G2 Pro 
PowerCaseMouseMouse Pad
Corsair AX-750 Corsair Obsidian 650D Microsoft IntelliMouse Optical  XTRAC Ripper XXL 
AudioAudioAudioAudio
Westone W3 IEMs RE-272 IEMs Shure SE-215 IEMs Schiit Bifrost DAC 
AudioAudio
Schiit Asgard 2 amp HiVi Swan M50W 2.1 
CPUMotherboardGraphicsRAM
Intel Core i7 950 GA-X58-UD3R Radeon HD 5450  24GB Corsair @ 1333mhz 
Hard DriveOSPowerCase
4x WD Cavair Red 1TB in RAID 0 Windows Server 2008 R2 x64 Corsair HX-520 LianLi LanCool 
  hide details  
Reply
post #13 of 14
Quote:
Originally Posted by tompsonn View Post

Quote:
Originally Posted by 3930K View Post

http://en.wikipedia.org/wiki/Long_double
Wikipedia seems to disagree. Huh?

long double != unsigned long double
You can't use unsigned. Unsigned floating points literally do not exist smile.gif
Oh ok doh.gif

Thanks!
post #14 of 14
Here comes ...


EDIT (2012-Nov-07): Well, last night I wrote 'MokujIN' revision 3+, it stands for Multiplication of INtegers in C using strings.
In the ZIP archive both Intel v12.1 and Microsoft v16 Windows executables are given along with source.
Revision 3+ has optimized main loop, as a side effect now a new CPU clock pseudo-measure has emerged:
MokujINs - it stands for number of cycles of main loop of MUL function made per second.
My 'Bonboniera' gave 77 MegaMokujINs i.e. 77+ million iterations per second.
At each iteration/cycle a digit vs digit multiplication is made i.e. the two operands are a single digit long.

In fact this console tool is a tiny-finy benchmarker which produces 315,653 digits long number.
Its performance depends on CPU clock and L2 cache speed mostly.

The only thing you need is to run 'RUNME.bat':
Code:
D:\_KAZE\MokujIN>dir
 Volume in drive D is S640_Vol5
 Volume Serial Number is F85D-148B

 Directory of D:\_KAZE\MokujIN

11/07/2012  07:20 AM    <DIR>          .
11/07/2012  07:20 AM    <DIR>          ..
11/07/2012  07:20 AM                 2 ENTER
11/07/2012  07:20 AM            32,826 MokujIN.c
11/07/2012  07:20 AM         2,141,184 MokujIN.doc
11/07/2012  07:20 AM         1,006,071 MokujIN.pdf
11/07/2012  07:20 AM         2,097,720 Mokujin_-_CG_Art_Image_-_Tekken_6_Bloodline_Rebellion.jpg
11/07/2012  07:20 AM               139 MokujIN_compile_Intel.bat
11/07/2012  07:20 AM               138 MokujIN_compile_Microsoft.bat
11/07/2012  07:20 AM           152,775 MokujIN_Intel_XE_IA-32_Version_12.1.cod
11/07/2012  07:20 AM            81,920 MokujIN_Intel_XE_IA-32_Version_12.1.exe
11/07/2012  07:20 AM            51,264 MokujIN_Microsoft_32-bit_Version_16.cod
11/07/2012  07:20 AM            64,512 MokujIN_Microsoft_32-bit_Version_16.exe
11/07/2012  07:20 AM             1,010 RUNME.bat
11/07/2012  07:20 AM             9,458 sha1sum.c
11/07/2012  07:20 AM            62,464 sha1sum.exe
11/07/2012  07:20 AM                18 SHA1_compile.bat
11/07/2012  07:20 AM             4,096 Timer.exe
              16 File(s)      5,705,597 bytes
               2 Dir(s)   3,942,146,048 bytes free

D:\_KAZE\MokujIN>RUNME.bat
Revision 3:
Computing 2^1048576 took 0,454 seconds with '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 1,856 seconds without '/TURBO' with Intel v12.1 on T7500 2200MHz.
Computing 2^1048576 took 0,426 seconds with '/TURBO' with Microsoft v16 on T7500 2200MHz.
Computing 2^1048576 took 1,678 seconds without '/TURBO' with Microsoft v16 on T7500 2200MHz.
SHA1 should be:
adebb3aac8ded6438719f8170a455f38dfebaae3
Computing 2^1048576 ...

D:\_KAZE\MokujIN>time0<enter 1>TotalTime.txt

D:\_KAZE\MokujIN>"MokujIN_Intel_XE_IA-32_Version_12.1" 2 1048576 /stats
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 2 digits long: 4 MokujINs i.e. digits per second.
Multiplying performance for operands 3 digits long: 9 MokujINs i.e. digits per second.
Multiplying performance for operands 5 digits long: 25 MokujINs i.e. digits per second.
Multiplying performance for operands 10 digits long: 100 MokujINs i.e. digits per second.
Multiplying performance for operands 20 digits long: 400 MokujINs i.e. digits per second.
Multiplying performance for operands 39 digits long: 1521 MokujINs i.e. digits per second.
Multiplying performance for operands 78 digits long: 6084 MokujINs i.e. digits per second.
Multiplying performance for operands 155 digits long: 24025 MokujINs i.e. digits per second.
Multiplying performance for operands 309 digits long: 95481 MokujINs i.e. digits per second.
Multiplying performance for operands 617 digits long: 380689 MokujINs i.e. digits per second.
Multiplying performance for operands 1234 digits long: 1522756 MokujINs i.e. digits per second.
Multiplying performance for operands 2467 digits long: 6086089 MokujINs i.e. digits per second.
Multiplying performance for operands 4933 digits long: 24334489 MokujINs i.e. digits per second.
Multiplying performance for operands 9865 digits long: 48659112 MokujINs i.e. digits per second.
Multiplying performance for operands 19729 digits long: 77846688 MokujINs i.e. digits per second.
Multiplying performance for operands 39457 digits long: 74135945 MokujINs i.e. digits per second.
Multiplying performance for operands 78914 digits long: 73263757 MokujINs i.e. digits per second.
Multiplying performance for operands 157827 digits long: 73262829 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK

D:\_KAZE\MokujIN>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3  MokujIN.txt

D:\_KAZE\MokujIN>time0<enter 1>>TotalTime.txt

D:\_KAZE\MokujIN>"MokujIN_Intel_XE_IA-32_Version_12.1" 2 1048576  1>MokujIN.txt

D:\_KAZE\MokujIN>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3  MokujIN.txt

D:\_KAZE\MokujIN>time0<enter 1>>TotalTime.txt

D:\_KAZE\MokujIN>"MokujIN_Microsoft_32-bit_Version_16" 2 1048576 /stats
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 2 digits long: 4 MokujINs i.e. digits per second.
Multiplying performance for operands 3 digits long: 9 MokujINs i.e. digits per second.
Multiplying performance for operands 5 digits long: 25 MokujINs i.e. digits per second.
Multiplying performance for operands 10 digits long: 100 MokujINs i.e. digits per second.
Multiplying performance for operands 20 digits long: 400 MokujINs i.e. digits per second.
Multiplying performance for operands 39 digits long: 1521 MokujINs i.e. digits per second.
Multiplying performance for operands 78 digits long: 6084 MokujINs i.e. digits per second.
Multiplying performance for operands 155 digits long: 24025 MokujINs i.e. digits per second.
Multiplying performance for operands 309 digits long: 95481 MokujINs i.e. digits per second.
Multiplying performance for operands 617 digits long: 380689 MokujINs i.e. digits per second.
Multiplying performance for operands 1234 digits long: 1522756 MokujINs i.e. digits per second.
Multiplying performance for operands 2467 digits long: 6086089 MokujINs i.e. digits per second.
Multiplying performance for operands 4933 digits long: 24334489 MokujINs i.e. digits per second.
Multiplying performance for operands 9865 digits long: 97318225 MokujINs i.e. digits per second.
Multiplying performance for operands 19729 digits long: 77846688 MokujINs i.e. digits per second.
Multiplying performance for operands 39457 digits long: 77842742 MokujINs i.e. digits per second.
Multiplying performance for operands 78914 digits long: 78828093 MokujINs i.e. digits per second.
Multiplying performance for operands 157827 digits long: 78085774 MokujINs i.e. digits per second.
Dumping the result to 'MokujIN.txt' ... OK

D:\_KAZE\MokujIN>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3  MokujIN.txt

D:\_KAZE\MokujIN>time0<enter 1>>TotalTime.txt

D:\_KAZE\MokujIN>"MokujIN_Microsoft_32-bit_Version_16" 2 1048576  1>MokujIN.txt

D:\_KAZE\MokujIN>sha1sum.exe MokujIN.txt
adebb3aac8ded6438719f8170a455f38dfebaae3  MokujIN.txt

D:\_KAZE\MokujIN>time0<enter 1>>TotalTime.txt

D:\_KAZE\MokujIN>type TotalTime.txt
The current time is:  7:20:38.22
Enter the new time:
The current time is:  7:28:11.63
Enter the new time:
The current time is:  7:59:07.65
Enter the new time:
The current time is:  8:06:12.04
Enter the new time:
The current time is:  8:34:10.47
Enter the new time:

D:\_KAZE\MokujIN>

Interestingly, Microsoft v16 outspeeds Intel v12.1 on both 'DEFAULT' and 'TURBO' tests.

And the source itself:
Code:
// MokujIN, a string multiplier, written by Kaze, 2012-Nov-07, revision 3+.
// Free download at: www.sanmayce.com/Downloads/MokujIN.zip

// Revision 3:
// Computing 2^^1048576 took 0,454 seconds with '/TURBO' with Intel v12.1 on T7500 2200MHz.
// Computing 2^^1048576 took 1,856 seconds without '/TURBO' with Intel v12.1 on T7500 2200MHz.
// Computing 2^^1048576 took 0,426 seconds with '/TURBO' with Microsoft v16 on T7500 2200MHz.
// Computing 2^^1048576 took 1,678 seconds without '/TURBO' with Microsoft v16 on T7500 2200MHz.

// On August 23rd, 2008, a UCLA computer in the GIMPS PrimeNet network discovered the 45th (biggest so far (2010-July)) known Mersenne prime,
// 2^43,112,609-1, a mammoth 12,978,189 digit number!
// Now, who can wait for 'MokujIN.exe 2 43112609 /turbo' to complete?!

/*
D:\_KAZE\MokujIN>MokujIN_Microsoft_32-bit_Version_16.exe /help
MokujIN, Multiplication of INtegers, written by Kaze, 2012-Nov-06, revision 3.
Usage: MokujIN [Number Power [/turbo]]
Note1: Power is signed 32bit integer i.e. up to 2^31-1=2,147,483,647.
Note2: Multiplicand or/and Multiplier cannot exceed 12978189 digits.
Note3: TURBO regime is several times (only) faster.
Example1: D:\_KAZE\MokujIN>MokujIN.exe 18446744073709551616 2
340282366920938463463374607431768211456
Example2: D:\_KAZE\MokujIN>MokujIN.exe
Multiplicand: 18446744073709551616
Multiplier  : 18446744073709551616
Result      : 340282366920938463463374607431768211456

D:\_KAZE\MokujIN>
*/

#include <stdio.h>
#include <time.h>
#define memory_size 12978189 // equals MAX(Multiplicand,Multiplier)

unsigned char* MUL(unsigned char* Result, unsigned char* Multiplicand, unsigned char* Multiplier)
{
/*
'QuickBASIC was/is an excellent environment/compiler, eh.
FUNCTION MUL$ (Multiplicand$, Multiplier$)
'DimSum = CandLength + ErLength
'REDIM Result%(1 TO DimSum)
FOR SF = 1 TO CandLength 
FOR QB = 1 TO ErLength
        CarryFlag = 0
        Cycle = QB - 1 + SF
        Tiller = Cand%(SF) * Er%(QB)
        Result%(Cycle) = Tiller MOD 10 + Result%(Cycle)
        IF Result%(Cycle) >= 10 THEN
                Result%(Cycle) = Result%(Cycle) - 10
                CarryFlag = 1
        END IF
        NextNumPos = Cycle + 1
        Result%(NextNumPos) = Result%(NextNumPos) + CarryFlag + Tiller \ 10
        DO WHILE Result%(NextNumPos) >= 10
                Result%(NextNumPos) = Result%(NextNumPos) - 10
                NextNumPos = NextNumPos + 1
                Result%(NextNumPos) = Result%(NextNumPos) + 1
        LOOP
NEXT QB
NEXT SF
END FUNCTION
*/

        // 'unsigned long long' makes the computation ugly with 32bit code, so going down to 'unsigned long':
        // 8:14:04.78
        // 8:25:46.22
        // vs
        // 1:41:43.84
        // 1:52:33.59
        // Or almost a minute ugliness!

        unsigned long SF, QB, Cycle, NextNumPos;
        unsigned long ResLength, CandLength, ErLength;
        unsigned char CarryFlag, TillerMostSignificantDigit, TillerLeastSignificantDigit;

        // 0..9 * 0..9
        unsigned char MSDarray[10][10] = {
                0,0,0,0,0,0,0,0,0,0, // 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9,
                0,0,0,0,0,0,0,0,0,0, // 1x0, 1x1, 1x2, 1x3, 1x4, 1x5, 1x6, 1x7, 1x8, 1x9,
                0,0,0,0,0,1,1,1,1,1, // 2x0, 2x1, 2x2, 2x3, 2x4, 2x5, 2x6, 2x7, 2x8, 2x9, 
                0,0,0,0,1,1,1,2,2,2, // 3x0, 3x1, 3x2, 3x3, 3x4, 3x5, 3x6, 3x7, 3x8, 3x9, 
                0,0,0,1,1,2,2,2,3,3, // 4x0, 4x1, 4x2, 4x3, 4x4, 4x5, 4x6, 4x7, 4x8, 4x9, 
                0,0,1,1,2,2,3,3,4,4, // 5x0, 5x1, 5x2, 5x3, 5x4, 5x5, 5x6, 5x7, 5x8, 5x9, 
                0,0,1,1,2,3,3,4,4,5, // 6x0, 6x1, 6x2, 6x3, 6x4, 6x5, 6x6, 6x7, 6x8, 6x9, 
                0,0,1,2,2,3,4,4,5,6, // 7x0, 7x1, 7x2, 7x3, 7x4, 7x5, 7x6, 7x7, 7x8, 7x9, 
                0,0,1,2,3,4,4,5,6,7, // 8x0, 8x1, 8x2, 8x3, 8x4, 8x5, 8x6, 8x7, 8x8, 8x9, 
                0,0,1,2,3,4,5,6,7,8  // 9x0, 9x1, 9x2, 9x3, 9x4, 9x5, 9x6, 9x7, 9x8, 9x9
        };
        unsigned char LSDarray[10][10] = {
                0,0,0,0,0,0,0,0,0,0, // 0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8, 0x9,
                0,1,2,3,4,5,6,7,8,9, // 1x0, 1x1, 1x2, 1x3, 1x4, 1x5, 1x6, 1x7, 1x8, 1x9,
                0,2,4,6,8,0,2,4,6,8, // 2x0, 2x1, 2x2, 2x3, 2x4, 2x5, 2x6, 2x7, 2x8, 2x9, 
                0,3,6,9,2,5,8,1,4,7, // 3x0, 3x1, 3x2, 3x3, 3x4, 3x5, 3x6, 3x7, 3x8, 3x9, 
                0,4,8,2,6,0,4,8,2,6, // 4x0, 4x1, 4x2, 4x3, 4x4, 4x5, 4x6, 4x7, 4x8, 4x9, 
                0,5,0,5,0,5,0,5,0,5, // 5x0, 5x1, 5x2, 5x3, 5x4, 5x5, 5x6, 5x7, 5x8, 5x9, 
                0,6,2,8,4,0,6,2,8,4, // 6x0, 6x1, 6x2, 6x3, 6x4, 6x5, 6x6, 6x7, 6x8, 6x9, 
                0,7,4,1,8,5,2,9,6,3, // 7x0, 7x1, 7x2, 7x3, 7x4, 7x5, 7x6, 7x7, 7x8, 7x9, 
                0,8,6,4,2,0,8,6,4,2, // 8x0, 8x1, 8x2, 8x3, 8x4, 8x5, 8x6, 8x7, 8x8, 8x9, 
                0,9,8,7,6,5,4,3,2,1  // 9x0, 9x1, 9x2, 9x3, 9x4, 9x5, 9x6, 9x7, 9x8, 9x9
        };
        // With above look-up tables (they replace all nasty *MULs, %MODs) we have a boost:
        // 1:41:43.84
        // 1:52:33.59 or 52*60+33-(41*60+43)= 650s
        // vs
        // 3:00:38.26
        // 3:08:11.38 or 08*60+11-(00*60+38)= 453s
        // Or 3+ minutes ugliness!
        
        CandLength =strlen(Multiplicand);
        ErLength =strlen(Multiplier);
        memset(Result, 0, (CandLength+ErLength)+1);

        if ( (ErLength == 1 && Multiplier[0] == 0+'0') || (CandLength == 1 && Multiplicand[0] == 0+'0') ) {
                Result[0] = 0+'0';
                return Result;
        }

        // In C the offset starts from 0 whereas in QuickBasic from 1, therefore '<' becomes '<='.
        for (SF=1; SF<=CandLength; SF++) {
        for (QB=1; QB<=ErLength; QB++) {
                CarryFlag = 0;
                Cycle = QB - 1 + SF;
                // In C the offset starts from 0 whereas in QuickBasic from 1, therefore '[SF]' becomes '[SF-1]'.
                // Here the strings are not reversed as in QB arrays, so [CandLength-SF+1] becomes [(CandLength-SF+1)-1].
                //Tiller = (Multiplicand[(CandLength-SF+1)-1]-'0') * (Multiplier[(ErLength-QB+1)-1]-'0');
                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next two:
                TillerLeastSignificantDigit = LSDarray[ Multiplicand[(CandLength-SF+1)-1]-'0' ][ Multiplier[(ErLength-QB+1)-1]-'0' ];
                TillerMostSignificantDigit = MSDarray[ Multiplicand[(CandLength-SF+1)-1]-'0' ][ Multiplier[(ErLength-QB+1)-1]-'0' ];
                //Result[Cycle-1] = (Tiller%10) + Result[Cycle-1];
                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next one:
                Result[Cycle-1] = TillerLeastSignificantDigit + Result[Cycle-1];
                if ( Result[Cycle-1] >= 10 ) {
                        Result[Cycle-1] = Result[Cycle-1] - 10;
                        CarryFlag = 1;
                }
                NextNumPos = Cycle + 1;
                //Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + (unsigned char)(Tiller/10);
                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next one:
                Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + TillerMostSignificantDigit;
                while (Result[NextNumPos-1] >= 10) {
                        Result[NextNumPos-1] = Result[NextNumPos-1] - 10;
                        NextNumPos = NextNumPos + 1;
                        Result[NextNumPos-1] = Result[NextNumPos-1] + 1;
                }
        }
        }

        //Here we have the result (REVERSED) in ASCII codes i.e. '0' stands for ASCII 000.
        //ResLength is either (CandLength+ErLength) or (CandLength+ErLength-1).
        ResLength=(CandLength+ErLength);
        if (Result[ResLength-1] == 0x00) ResLength--;

        //The last thing to be done: the in-place reversal:
        for (SF=1; SF<=ResLength/2; SF++) {
                CarryFlag = Result[SF-1];
                Result[SF-1] = Result[(ResLength-SF+1)-1]+'0';
                Result[(ResLength-SF+1)-1] = CarryFlag+'0';
        }
        if (ResLength%2 != 0)
                Result[(ResLength/2+1)-1] = Result[(ResLength/2+1)-1]+'0';
        //Terminate it:
        Result[ResLength] = 0;

        return Result;
}

int main(int argc, char *argv[])
{
        signed int n, p, p31;
        unsigned char* pointerflushUNALIGN;
        unsigned char* Multiplicand;
        unsigned char* Multiplier;
        time_t t1, t2;
        unsigned long long DPS;
        FILE *fp_out;

        if (argc != 3 && argc != 4) {
                printf("MokujIN, Multiplication of INtegers, written by Kaze, 2012-Nov-07, revision 3+.\n");
                printf("Usage: MokujIN [Number Power [/turbo|/stats]]\n");
                printf("Note0: With no parameters given, interactive multiplication regime is on.\n");
                printf("Note1: Power is signed 32bit integer i.e. up to 2^31-1=2,147,483,647.\n");
                printf("Note2: Multiplicand or/and Multiplier cannot exceed %d digits.\n", memory_size);
                printf("Note3: '/turbo' regime is several times (only) faster.\n");
                printf("Note4: '/stats' is '/turbo' too, also dumps the result to 'MokujIN.txt'.\n");
                printf("Note5: When power is big enough (as in Example3) MokujINs (or the prosaic DPS) is an useful CPU clock pseudo-measure.\n");
                printf("Example1: D:\\_KAZE\\MokujIN>MokujIN.exe 18446744073709551616 2\n");
                printf("340282366920938463463374607431768211456\n");
                printf("Example2: D:\\_KAZE\\MokujIN>MokujIN.exe\n");
                printf("Multiplicand: 18446744073709551616\n");
                printf("Multiplier  : 18446744073709551616\n");
                printf("Result      : 340282366920938463463374607431768211456\n");
                printf("Example3: D:\\_KAZE\\MokujIN>MokujIN.exe 2 1048576 /stats\n");
                printf("...\n");
                printf("Multiplying performance for operands 39457 digits long: 77842742 MokujINs i.e. digits per second.\n");
                printf("Multiplying performance for operands 78914 digits long: 77842742 MokujINs i.e. digits per second.\n");
                printf("Multiplying performance for operands 157827 digits long: 77358266 MokujINs i.e. digits per second.\n");
                printf("Dumping the result to 'MokujIN.txt' ... OK\n");
        }

        pointerflushUNALIGN = (unsigned char*)malloc( memory_size*2+1 ); //+1 because we need sentinel
        if( pointerflushUNALIGN == NULL ) {
                printf("MokujIN: Needed memory allocation denied!\n");
                return( 1 );
        }
        Multiplicand = (unsigned char*)malloc( memory_size*2+1 ); //+1 because we need sentinel
        if( Multiplicand == NULL ) {
                printf("MokujIN: Needed memory allocation denied!\n");
                return( 1 );
        }
        Multiplier = (unsigned char*)malloc( memory_size*2+1 ); //+1 because we need sentinel
        if( Multiplier == NULL ) {
                printf("MokujIN: Needed memory allocation denied!\n");
                return( 1 );
        }

        if (argc == 3 || argc == 4) {
                p = atoi(argv[2]); //_atoi64(argv[2]);
                if (p < 2) {
                        printf("MokujIN: Power should be 2 or greater!\n");
                        free(pointerflushUNALIGN); free(Multiplicand); free(Multiplier);
                        return( 2 );
                }
                memcpy(Multiplicand, argv[1], strlen(argv[1])); Multiplicand[strlen(argv[1])]=0; // Not assuming the allocated pool is ZEROed!
                memcpy(Multiplier, argv[1], strlen(argv[1])); Multiplier[strlen(argv[1])]=0;
                if (argc == 4) {
                        p31 = 0;
                        while (p>>1) {
                                p31++;
                                p = p>>1;
                        }
                        p = atoi(argv[2]);
                        p = p - (1<<p31);
                        while (p31--) {
                                (void) time(&t1);
                                MUL(pointerflushUNALIGN, Multiplicand, Multiplier);
                                (void) time(&t2);
                                if (t2 <= t1) {t2 = t1; t2++;}
                                if ( strcmp("/stats\0",argv[3]) == 0 ) {
                                        DPS = (unsigned long long)strlen(Multiplier);
                                        DPS *= DPS;
                                        DPS = DPS/(unsigned long long)(t2-t1);
                                        printf("Multiplying performance for operands %lu digits long: %lu MokujINs i.e. digits per second.\n", strlen(Multiplier), DPS);
                                }
                                memcpy(Multiplicand, pointerflushUNALIGN, strlen(pointerflushUNALIGN)); Multiplicand[strlen(pointerflushUNALIGN)]=0;
                                memcpy(Multiplier, pointerflushUNALIGN, strlen(pointerflushUNALIGN)); Multiplier[strlen(pointerflushUNALIGN)]=0;
                        }
                        memcpy(Multiplier, argv[1], strlen(argv[1])); Multiplier[strlen(argv[1])]=0;
                        while (p--) {
                                MUL(pointerflushUNALIGN, Multiplicand, Multiplier);
                                memcpy(Multiplicand, pointerflushUNALIGN, strlen(pointerflushUNALIGN)); Multiplicand[strlen(pointerflushUNALIGN)]=0;
                        }
                } else {
                        while (p-->1) {
                                MUL(pointerflushUNALIGN, Multiplicand, Multiplier);
                                memcpy(Multiplicand, pointerflushUNALIGN, strlen(pointerflushUNALIGN)); Multiplicand[strlen(pointerflushUNALIGN)]=0;
                        }
                }
                if ( argc == 4 && strcmp("/stats\0",argv[3]) == 0 ) {
                        if( ( fp_out = fopen( "MokujIN.txt", "wb+" ) ) == NULL )
                        { printf( "MokujIN: Can't create file 'MokujIN.txt'.\n" ); return( 4 ); }
                        printf("Dumping the result to 'MokujIN.txt' ... ");
                        fprintf( fp_out, "%s\r\n", Multiplicand );
                        fclose(fp_out);
                        printf("OK\n");
                } else
                        printf("%s\n", Multiplicand);
                free(pointerflushUNALIGN); free(Multiplicand); free(Multiplier);
                return 0;
        } else if (argc == 1) {
                printf("\nMultiplicand: ");
                scanf("%s", Multiplicand);
                printf("Multiplier  : ");
                scanf("%s", Multiplier);
                printf("Result      : ");
                printf("%s\n", MUL(pointerflushUNALIGN, Multiplicand, Multiplier));
                free(pointerflushUNALIGN); free(Multiplicand); free(Multiplier);
                return 0;
        } else {
                free(pointerflushUNALIGN); free(Multiplicand); free(Multiplier);
                return 3;
        }
}

As a side note:
Who can speed up the 2^2^20 (315,653 digits long number) calculation, say, 1000 times as a start?
On my 'Bonboniera' laptop it took 54 minutes i.e. 3000+ seconds - should be 3- seconds.

A challenge for 'MokujIN':
Who can wait for 'MokujIN.exe 2 43112609 /stats' to complete?!
I expect at least a week of crunching.
Code:
D:\_KAZE\MokujIN>MokujIN_Microsoft_32-bit_Version_16.exe 2 43112609 /stats
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 1 digits long: 1 MokujINs i.e. digits per second.
Multiplying performance for operands 2 digits long: 4 MokujINs i.e. digits per second.
Multiplying performance for operands 3 digits long: 9 MokujINs i.e. digits per second.
Multiplying performance for operands 5 digits long: 25 MokujINs i.e. digits per second.
Multiplying performance for operands 10 digits long: 100 MokujINs i.e. digits per second.
Multiplying performance for operands 20 digits long: 400 MokujINs i.e. digits per second.
Multiplying performance for operands 39 digits long: 1521 MokujINs i.e. digits per second.
Multiplying performance for operands 78 digits long: 6084 MokujINs i.e. digits per second.
Multiplying performance for operands 155 digits long: 24025 MokujINs i.e. digits per second.
Multiplying performance for operands 309 digits long: 95481 MokujINs i.e. digits per second.
Multiplying performance for operands 617 digits long: 380689 MokujINs i.e. digits per second.
Multiplying performance for operands 1234 digits long: 1522756 MokujINs i.e. digits per second.
Multiplying performance for operands 2467 digits long: 6086089 MokujINs i.e. digits per second.
Multiplying performance for operands 4933 digits long: 24334489 MokujINs i.e. digits per second.
Multiplying performance for operands 9865 digits long: 97318225 MokujINs i.e. digits per second.
Multiplying performance for operands 19729 digits long: 77846688 MokujINs i.e. digits per second.
Multiplying performance for operands 39457 digits long: 77842742 MokujINs i.e. digits per second.
Multiplying performance for operands 78914 digits long: 78828093 MokujINs i.e. digits per second.
Multiplying performance for operands 157827 digits long: 78331326 MokujINs i.e. digits per second.
Multiplying performance for operands 315653 digits long: 78207862 MokujINs i.e. digits per second.
Multiplying performance for operands 631306 digits long: 78223212 MokujINs i.e. digits per second.
...

Well, the dummy math shows that at 78 MegaMokujINs speed multiplying two 631306 digits long numbers would take:
631306*631306 iterations of main loop which gives
398,547,265,636 / 78,000,000 = 5,109 seconds
But our final number (12+ million digits) needs two ten times longer numbers than above ones, so:
6,000,000*6,000,000/ 78,000,000 = 461,538 seconds
Ouch, it is 128 hours or one week.

And the assembly dump of MUL function main loop:
Code:
// The following assembly/code dump is the main loop of MokujIN (namely: 'MUL' function):
// It is 32b-290+6 = 161 bytes long.
// D:\_KAZE\MokujIN>cl /Ox MokujIN.c /FAcs
// Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.30319.01 for 80x86
/*
$LL14@MUL:

; 122  :        for (QB=1; QB<=ErLength; QB++) {

  00290 83 ff 01         cmp     edi, 1
  00293 0f 82 87 00 00
        00               jb      $LN13@MUL
  00299 8b 54 24 24      mov     edx, DWORD PTR _ErLength$[esp+256]
  0029d 8d 45 ff         lea     eax, DWORD PTR [ebp-1]
  002a0 8d 34 01         lea     esi, DWORD PTR [ecx+eax]
  002a3 8b 4c 24 1c      mov     ecx, DWORD PTR _Multiplier$GSCopy$[esp+256]
  002a7 8d 7c 39 ff      lea     edi, DWORD PTR [ecx+edi-1]
  002ab 8b cd            mov     ecx, ebp
  002ad 2b c8            sub     ecx, eax
  002af 89 4c 24 30      mov     DWORD PTR tv1469[esp+256], ecx
  002b3 89 54 24 28      mov     DWORD PTR tv771[esp+256], edx
  002b7 eb 07 8d a4 24
        00 00 00 00      npad    9
$LL30@MUL:

; 123  :                CarryFlag = 0;
; 124  :                Cycle = QB - 1 + SF;
; 125  :                // In C the offset starts from 0 whereas in QuickBasic from 1, therefore '[SF]' becomes '[SF-1]'.
; 126  :                // Here the strings are not reversed as in QB arrays, so [CandLength-SF+1] becomes [(CandLength-SF+1)-1].
; 127  :                //Tiller = (Multiplicand[(CandLength-SF+1)-1]-'0') * (Multiplier[(ErLength-QB+1)-1]-'0');
; 128  :                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next two:
; 129  :                TillerLeastSignificantDigit = LSDarray[ Multiplicand[(CandLength-SF+1)-1]-'0' ][ Multiplier[(ErLength-QB+1)-1]-'0' ];
; 130  :                TillerMostSignificantDigit = MSDarray[ Multiplicand[(CandLength-SF+1)-1]-'0' ][ Multiplier[(ErLength-QB+1)-1]-'0' ];

  002c0 8b 44 24 14      mov     eax, DWORD PTR tv1486[esp+256]
  002c4 0f b6 00         movzx   eax, BYTE PTR [eax]
  002c7 8d 0c 80         lea     ecx, DWORD PTR [eax+eax*4]
  002ca 0f b6 07         movzx   eax, BYTE PTR [edi]
  002cd 8d 04 48         lea     eax, DWORD PTR [eax+ecx*2]
  002d0 8a 8c 04 24 fe
        ff ff            mov     cl, BYTE PTR _MSDarray$[esp+eax-272]

; 131  :                //Result[Cycle-1] = (Tiller%10) + Result[Cycle-1];
; 132  :                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next one:
; 133  :                Result[Cycle-1] = TillerLeastSignificantDigit + Result[Cycle-1];

  002d7 8a 84 04 88 fe
        ff ff            mov     al, BYTE PTR _LSDarray$[esp+eax-272]
  002de 02 06            add     al, BYTE PTR [esi]
  002e0 32 d2            xor     dl, dl
  002e2 88 06            mov     BYTE PTR [esi], al

; 134  :                if ( Result[Cycle-1] >= 10 ) {

  002e4 3c 0a            cmp     al, 10                 ; 0000000aH
  002e6 72 06            jb      SHORT $LN8@MUL

; 135  :                        Result[Cycle-1] = Result[Cycle-1] - 10;

  002e8 2c 0a            sub     al, 10                 ; 0000000aH
  002ea 88 06            mov     BYTE PTR [esi], al

; 136  :                        CarryFlag = 1;

  002ec b2 01            mov     dl, 1
$LN8@MUL:

; 139  :                //Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + (unsigned char)(Tiller/10);
; 140  :                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next one:
; 141  :                Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + TillerMostSignificantDigit;

  002ee 02 ca            add     cl, dl
  002f0 00 4e 01         add     BYTE PTR [esi+1], cl

; 142  :                while (Result[NextNumPos-1] >= 10) {

  002f3 80 7e 01 0a      cmp     BYTE PTR [esi+1], 10   ; 0000000aH
  002f7 72 13            jb      SHORT $LN10@MUL

; 137  :                }
; 138  :                NextNumPos = Cycle + 1;

  002f9 8b 44 24 30      mov     eax, DWORD PTR tv1469[esp+256]
  002fd 03 c6            add     eax, esi
  002ff 90               npad    1
$LL7@MUL:

; 143  :                        Result[NextNumPos-1] = Result[NextNumPos-1] - 10;

  00300 80 00 f6         add     BYTE PTR [eax], 246    ; 000000f6H

; 144  :                        NextNumPos = NextNumPos + 1;
; 145  :                        Result[NextNumPos-1] = Result[NextNumPos-1] + 1;

  00303 fe 40 01         inc     BYTE PTR [eax+1]
  00306 40               inc     eax
  00307 80 38 0a         cmp     BYTE PTR [eax], 10     ; 0000000aH
  0030a 73 f4            jae     SHORT $LL7@MUL
$LN10@MUL:

; 139  :                //Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + (unsigned char)(Tiller/10);
; 140  :                // LOOK-UP TABLE BOOST: Above COMMENTED line becomes next one:
; 141  :                Result[NextNumPos-1] = Result[NextNumPos-1] + CarryFlag + TillerMostSignificantDigit;

  0030c 46               inc     esi
  0030d 4f               dec     edi
  0030e ff 4c 24 28      dec     DWORD PTR tv771[esp+256]
  00312 75 ac            jne     SHORT $LL30@MUL

; 122  :        for (QB=1; QB<=ErLength; QB++) {

  00314 8b 4c 24 20      mov     ecx, DWORD PTR _SF$[esp+256]
  00318 8b 7c 24 24      mov     edi, DWORD PTR _ErLength$[esp+256]
  0031c 8b 44 24 10      mov     eax, DWORD PTR _CandLength$[esp+256]
$LN13@MUL:

; 118  :        }
; 119  : 
; 120  :        // In C the offset starts from 0 whereas in QuickBasic from 1, therefore '<' becomes '<='.
; 121  :        for (SF=1; SF<=CandLength; SF++) {

  00320 ff 4c 24 14      dec     DWORD PTR tv1486[esp+256]
  00324 41               inc     ecx
  00325 89 4c 24 20      mov     DWORD PTR _SF$[esp+256], ecx
  00329 3b c8            cmp     ecx, eax
  0032b 0f 86 5f ff ff
        ff               jbe     $LL14@MUL
$LN12@MUL:
*/

By the way:
2^43112609 = 3.1647026933025592314345372394933751605410618847e+12978188

Enjoy!
Edited by Sanmayce - 11/7/12 at 8:50am
New Posts  All Forums:Forum Nav:
  Return Home
  Back to Forum: Coding and Programming
Overclock.net › Forums › Software, Programming and Coding › Coding and Programming › High-precision program that calculates 2^n