Tech ARP Forums

Go Back   Tech ARP Forums > Hardware Discussion > Overclocking, Cooling & Modding
Register
FAQ Members List Calendar Arcade Mark Forums Read

Google Web www.techarp.com forums.techarp.com

Overclocking, Cooling & Modding Come on in and find out how to keep your babies cooler, run faster, and show off your modded cases!

Reply
 
LinkBack Thread Tools
Old 11th May 2008, 03:44 AM   #1 (permalink)
Getting there
 
Join Date: 18 Aug 2007
Posts: 167
Reputation: 249
graysky has a spectacular aura aboutgraysky has a spectacular aura aboutgraysky has a spectacular aura about
Rep Power: 3
Default Memory bandwidth tests... any real differences (part 2)

About 7 months ago I posted data comparing two memory dividers (1:1 and 3:5 @ 333 MHz) on my then Q6600/P965 based system and concluded that for the 67 % increase in memory bandwidth, the marginal gains in actual performance weren't worth the extra voltage/heat.

Since then I've upgraded my hardware to an X3360/P35 setup and wanted to revisit this issue. Again, two dividers were looked at: one pair running 8.5x333=2.83 GHz, and another running @ 8.5x400=3.40 GHz:

333 MHz FSB:
1:1 a.k.a. PC2-5300 (667 MHz)
5:8 a.k.a. PC2-8500 (1,067 MHz)

400 MHz FSB:
1:1 a.k.a. PC2-6400 (800 MHz)
4:5 a.k.a. PC2-8000 (1,000 MHz)

I figured there would be a much greater difference in the 333 FSB case since the memory bandwidth increased by 60 % vs. 25 % in the 400 MHz FSB case. All other BIOS settings were held constant with the exception of the divider (and the strap) and the given FSB. Subtimings were set to auto and as such could vary as managed by the board which I found out, was required since manually settings some of the subtimings lead to either an incomplete POST, or an unstable system.

The benchmarks were broken down into three categories:
1) "Real-World" Applications
2) 3D Games
3) Synthetic Benchmarks

The following "real-world" apps were chosen: x264, winrar, and the trial version of Photohop CS3. All were run on a freshly installed version of Windows XP Pro x64 SP2 w/ all relevant hotfixes. The 3D games were just Doom3 (an older game) and Crysis (a newer game). Finally, I threw in some synthetic benchmarks consisting of the Winrar self test, Super Pi-mod, and Everest's synthetic memory benchmark. Here is an explanation of the specifics:

Trial of Photoshop CS3 – The batch function in PSCS3 v10.0.1 was used process a total of fifty-six, 10.1 MP jpeg files (226 MB totally):

1) bicubic resize 10.1 MP to 2.2 MP (3872x2592 --> 1800x1200) which is the perfect size for a 4x6 print @ 300 dpi.
2) smart sharpen (120 %, 0.9 px radius, more accurate, lens blur setting)
3) auto levels
4) saved the resulting files as a quality 10 jpg.

Benchmark results are an average of two runs timed with a stopwatch.

RAR version 3.71 – rar.exe ran my standard backup batch file which generated about 955 MB of rars containing 5,210 files totally. Here is the commandline used:
Code:
rar a -m3 -md4096 -v100m -rv40p -msjpg;mp3;tif;avi;zip;rar;gpg;jpg "f:\Backups\Backup.rar" @list.txt
where list.txt a list of all the target files/dirs included in back up set. Benchmark results are an average of two runs timed with a stopwatch.

x264 Benchmark HD – Automatically runs a 2-pass encode on the same 720p MPEG-2 (1280x720 DVD source) file four times totally. It contains two versions of x264.exe and runs it on both. The benchmark is the best three of four runs (FPS) converted to total encode time.

Shameless promotion --> you can read more about the x264 Benchmark HD at this URL which contains results for hundreds of systems. You can also download the benchmark and test your own machine.

3D Games Based Benchmarks

Doom3 - Ran timeddemo demo1 a total of three times and averaged the fps as the result. Settings were 1,280x1,024, ultra quality with 8x AA.

Crysis - Ran the included "Benchmark_CPU.bat" and "Benchmark_GPU.bat" both of which runs the pre-defined timedemo, looped four times. I took the best three of four (average FPS) and averaged them together as the benchmark. Settings were 1,024x768, very high for all (used the DX9 very high settings hack, and 2x AA.

"Synthetic" Application Based Tests

WinRAR version 3.71 – If you hit alt-B in WinRAR, it'll run a synthetic benchmark. This was run twice (stopped after 150 MB) and is the average of four runs.

SuperPI / mod1.5 XS – The 16M test was run twice, and the average of the two are the benchmark.

Everest v4.50.1330 Memory Benchmark - Ran this benchmark a total of three times and averaged the results.

Hardware specs:
Code:
D.F.I. LP LT P35-TR2 (BIOS: LP35D317)
Intel X3360 @ 8.5x400=3.40 GHz
Corsair Dominator DDR2-1066 (TWIN2X4096-8500C5DF)
   2x 2Gb @ 5-5-5-15 (all subtimings on auto)

 (tRD=8) @ 667 MHz (1:1) @ 2.100V
 (tRD=7) @ 1,066 MHz (5:8) @ 2.100V
 (tRD=8) @ 800 MHz (1:1) @ 2.100V
 (tRD=6) @ 1,000 MHz (4:5) @ 2.100V

EVGA Geforce 8800GTS (G92) w/ 512 meg
Core=770 MHz
Shader=1,923 MHz
Memory=2,000 MHz
Note: the performance levels (tRD) are set automatically by the board which wouldn't POST if I manually tweaked them. Even though they're different, I still feel the data are valid since this is the only way I can run them. In other words, if I'm going to run the higher dividers, it'll be as such or it won't POST!

Without further ado, here are the data starting first with a 333 MHz FSB comparing the 1:1 vs. 5:8 divider (DDR2-667 vs. DDR-1066):


Here are the averaged data visualized graphically:


Now on to the 400 MHz FSB comparing the 1:1 vs. 4:5 divider (DDR2-800 vs. DDR2-1000):


And graphically:


As you can see, there way nothing spectacular in either the real-world category, or the 3D games category in comparison to the massive increase in memory bandwidth (shown on the graphs in red). In fact, I was surprised to see that there were really no gains by Doom3 and minimal gains by Crysis. This is probably due to the fact that the video card shoulders the burden of these games with Doom3 being the light-weight of the two. As expected, the synthetic benchmarks did pick-up on the larger bandwidth, but only in the case of the 400 MHz FSB did I see anything approaching the theoretical increase (14 % of 25 % vs 15 % of 60 %).

If you read my first memory bandwidth post, perhaps the same conclusions can be drawn from these new data. One thing I'll add is that this new MB doesn't require extra voltage like my older P5B-Deluxe did to run the higher dividers, so it's not producing that much more heat. That said, I'm actually running the system with the 4:5 divider, since things seem to feel faster to me (windows opening, responsiveness, etc.) which are all unfortunately intangibles I can't measure.
__________________
http://encoding.n3.net <--- for all your DVD and audio CD backup needs!


Last edited by graysky : 12th May 2008 at 02:25 AM.
graysky is offline   Reply With Quote
SPONSOR
Old 11th May 2008, 08:28 AM   #2 (permalink)
Pickin' Da Gitfiddle
 
Mac Daddy's Avatar
 
Join Date: 19 Nov 2007
Location: Canada
Posts: 1,906
Reputation: 802
Mac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to behold
Rep Power: 10
Default

It all comes down to latency. No matter how hard you run it or over volt it latency is the key. Whats numbers do you get running Everest Bro ?
__________________
Mac Daddy is offline   Reply With Quote
Old 11th May 2008, 09:04 AM   #3 (permalink)
Administrator
 
Chai's Avatar
 
Join Date: 6 Oct 2002
Location: Maranello
Posts: 25,809
Reputation: 3674
Chai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond repute
Rep Power: 67
Default

Quote:
Originally Posted by Mac Daddy View Post
It all comes down to latency. No matter how hard you run it or over volt it latency is the key. Whats numbers do you get running Everest Bro ?
Latency will show similar results, or worse.
__________________
Chai (Contributor & Forum Admin)
http://www.techarp.com/

Intel E8400 o/c 4.05GHz, Abit IP35 Pro, Corsair TWIN2X4096-6400C5DHX, Asus EN8800GTS 512MB, WD Raptor WD740, WD 5000AAKS x2, Antec TPII-480, EMU 0404 PCI
Chai is offline   Reply With Quote
Old 11th May 2008, 05:48 PM   #4 (permalink)
Getting there
 
Join Date: 18 Aug 2007
Posts: 167
Reputation: 249
graysky has a spectacular aura aboutgraysky has a spectacular aura aboutgraysky has a spectacular aura about
Rep Power: 3
Default

I just last night discovered the 266/667 strap on my MB and have been p95'ing it (4:5 divider = 1,000 MHz). I'm guessing it'll be a tad quicker than the 5:6.

Subtimings for the untested 4:5 since I'm booted as such:
Code:
CAS Latency (CL)	5T
RAS To CAS Delay (tRCD)	5T
RAS Precharge (tRP)	5T
RAS Active Time (tRAS)	15T
Row Refresh Cycle Time (tRFC)	52T
Command Rate (CR)	2T
RAS To RAS Delay (tRRD)	3T
Write Recovery Time (tWR)	14T
Read To Read Delay (tRTR)	Same Rank: 4T, Different Rank: 6T
Read To Write Delay (tRTW)	8T
Write To Read Delay (tWTR)	Same Rank: 11T, Different Rank: 5T
Write To Write Delay (tWTW)	Same Rank: 4T, Different Rank: 6T
Read To Precharge Delay (tRTP)	5T
Write To Precharge Delay (tWTP)	14T
Precharge To Precharge Delay (tPTP)	1T
Refresh Period (tREF)	2600T
DRAM Read ODT	3T
DRAM Write ODT	6T
MCH Read ODT	8T
Performance Level	6
Read Delay Phase Adjust	+9T
DIMM1 Clock Fine Delay	13T
DIMM2 Clock Fine Delay	4T
DIMM3 Clock Fine Delay	12T
DIMM4 Clock Fine Delay	3T
__________________
http://encoding.n3.net <--- for all your DVD and audio CD backup needs!

graysky is offline   Reply With Quote
Old 11th May 2008, 07:39 PM   #5 (permalink)
Pickin' Da Gitfiddle
 
Mac Daddy's Avatar
 
Join Date: 19 Nov 2007
Location: Canada
Posts: 1,906
Reputation: 802
Mac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to behold
Rep Power: 10
Default

Quote:
Originally Posted by Chai View Post
Latency will show similar results, or worse.
Raising the voltage to the rated 2.1V from the default DDR800 default of 1.8 and increasing the memory bus from 400 to 522 drops my latency from 69ns to 56.1ns and my memory reads and writes from 7.2Gbps to 8.5Gbps
__________________
Mac Daddy is offline   Reply With Quote
Old 11th May 2008, 10:31 PM   #6 (permalink)
Administrator
 
Chai's Avatar
 
Join Date: 6 Oct 2002
Location: Maranello
Posts: 25,809
Reputation: 3674
Chai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond reputeChai has a reputation beyond repute
Rep Power: 67
Default

Quote:
Originally Posted by Mac Daddy View Post
Raising the voltage to the rated 2.1V from the default DDR800 default of 1.8 and increasing the memory bus from 400 to 522 drops my latency from 69ns to 56.1ns and my memory reads and writes from 7.2Gbps to 8.5Gbps
That is only for memory benchmarks. But in real world apps, the difference is less than 1% usually.
__________________
Chai (Contributor & Forum Admin)
http://www.techarp.com/

Intel E8400 o/c 4.05GHz, Abit IP35 Pro, Corsair TWIN2X4096-6400C5DHX, Asus EN8800GTS 512MB, WD Raptor WD740, WD 5000AAKS x2, Antec TPII-480, EMU 0404 PCI
Chai is offline   Reply With Quote
Old 12th May 2008, 01:31 AM   #7 (permalink)
Hold me back! I can't stop posting!!!
 
Join Date: 16 Dec 2002
Location: Floating Island Of Mandango
Posts: 8,465
Reputation: 2724
ZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond reputeZuePhok has a reputation beyond repute
Rep Power: 41
Default

memory bandwidth has never been an issue with the 4/6mb C2D C2Q. Intel has pretty solid manufacturing capabilities. they can afford to use more transistors to make larger cache.
even though nehalem features an IMC, it's still fed by a large pie of L3 cache. I once asked an intel senior fellow, if the implementation of an IMC would allow Intel to bring down the transistor count of their upcoming CPUs (pretty good trick right, 30mil for an IMC to replace a 200m 4mb cache), so that more chips can be made out of a wafer. Can IMC replace large cache size? The answer is NO. even with an IMC, accessing the system memory is still very, very expensive. so i supposed nehalem probably wont respond very well to fast system memory. this is actually a good news for us if it's true. in the past, i spent more for high performance system memory than for a K8 mobo/cpu because the K8 chips required uber fast memory to perform at its maximum potential. it's just ridiculous. now we kind of know just how inferior the design of K8 caching system is.
__________________
DYKT: Our DNA can expired? Thank god my science class wasn't taught by Pak Lah
ZuePhok is online now   Reply With Quote
Old 12th May 2008, 02:20 AM   #8 (permalink)
Pickin' Da Gitfiddle
 
Mac Daddy's Avatar
 
Join Date: 19 Nov 2007
Location: Canada
Posts: 1,906
Reputation: 802
Mac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to behold
Rep Power: 10
Default

Quote:
Originally Posted by Chai View Post
That is only for memory benchmarks. But in real world apps, the difference is less than 1% usually.
On a 32 bit OS like XP Sp2 and Vista 32bit I would tend to agree with you but I don't think Graysky's results reflect a 64 bit OS like Vista Ultimate X64 that properly utilizes more than 2.5G of RAM.

"Real World" applications show a significant improvement for me with X64 even the loading of the OS itself, browser speed, program speed upon initial opening, compression programs ... games ...

If Graysky wishes me to research how this effects OS's that utilize more than 2G of physical RAM effectively then I will
__________________
Mac Daddy is offline   Reply With Quote
Old 12th May 2008, 02:42 AM   #9 (permalink)
Pickin' Da Gitfiddle
 
Mac Daddy's Avatar
 
Join Date: 19 Nov 2007
Location: Canada
Posts: 1,906
Reputation: 802
Mac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to behold
Rep Power: 10
Default

Quote:
Originally Posted by ZuePhok View Post
memory bandwidth has never been an issue with the 4/6mb C2D C2Q. Intel has pretty solid manufacturing capabilities. they can afford to use more transistors to make larger cache.
even though nehalem features an IMC, it's still fed by a large pie of L3 cache. I once asked an intel senior fellow, if the implementation of an IMC would allow Intel to bring down the transistor count of their upcoming CPUs (pretty good trick right, 30mil for an IMC to replace a 200m 4mb cache), so that more chips can be made out of a wafer. Can IMC replace large cache size? The answer is NO. even with an IMC, accessing the system memory is still very, very expensive. so i supposed nehalem probably wont respond very well to fast system memory. this is actually a good news for us if it's true. in the past, i spent more for high performance system memory than for a K8 mobo/cpu because the K8 chips required uber fast memory to perform at its maximum potential. it's just ridiculous. now we kind of know just how inferior the design of K8 caching system is.
Nice post !!

L2 cache is very important and in the above changes my L2 latency on my E6550 also dropped from 4.2ns to 3.6ns. Bro I know your not into synthetic benchmarks and hear ya but they do provide faster feedback on changes then say playing a game for a few hours then going .. hmmm .. seems faster.

On Vista in my limited experience handles memory very well but not L2 cache. Most of the instabilities I have encountered O/Cing under Vista Ultimate X64 and Ubuntu Linux lead to stressing the L2 cache and nothing else.

Anyway nice post
__________________
Mac Daddy is offline   Reply With Quote
Old 12th May 2008, 05:53 AM   #10 (permalink)
Pickin' Da Gitfiddle
 
Mac Daddy's Avatar
 
Join Date: 19 Nov 2007
Location: Canada
Posts: 1,906
Reputation: 802
Mac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to beholdMac Daddy is a splendid one to behold
Rep Power: 10
Default

Quote:
Originally Posted by graysky View Post
I just last night discovered the 266/667 strap on my MB and have been p95'ing it (4:5 divider = 1,000 MHz). I'm guessing it'll be a tad quicker than the 5:6.

Subtimings for the untested 4:5 since I'm booted as such:
Code:
CAS Latency (CL)	5T
RAS To CAS Delay (tRCD)	5T
RAS Precharge (tRP)	5T
RAS Active Time (tRAS)	15T
Row Refresh Cycle Time (tRFC)	52T
Command Rate (CR)	2T
RAS To RAS Delay (tRRD)	3T
Write Recovery Time (tWR)	14T
Read To Read Delay (tRTR)	Same Rank: 4T, Different Rank: 6T
Read To Write Delay (tRTW)	8T
Write To Read Delay (tWTR)	Same Rank: 11T, Different Rank: 5T
Write To Write Delay (tWTW)	Same Rank: 4T, Different Rank: 6T
Read To Precharge Delay (tRTP)	5T
Write To Precharge Delay (tWTP)	14T
Precharge To Precharge Delay (tPTP)	1T
Refresh Period (tREF)	2600T
DRAM Read ODT	3T
DRAM Write ODT	6T
MCH Read ODT	8T
Performance Level	6
Read Delay Phase Adjust	+9T
DIMM1 Clock Fine Delay	13T
DIMM2 Clock Fine Delay	4T
DIMM3 Clock Fine Delay	12T
DIMM4 Clock Fine Delay	3T
Try loosening your tRFC to 60 and tighten up your other timings. Then drop your tRFC again making sure to check for stability.
__________________
Mac Daddy is offline   Reply With Quote
Reply


Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT +8. The time now is 03:36 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Copyright © 1998-2007 Tech ARP. All rights reserved.