What tool to diagnose bad hardware?

Posts: 299
I was given a desktop computer that exhibits all sorts of weird behavior. Program crashes, GUI crashes, corrupted files, corrupted downloads, corrupted file systems, spontaneous reboots, lock-ups, etc. In windows AND linux. Sometimes I boot up and it won't recognize my keyboard. Other times it won't recognize my mouse. Sometimes it gives me the default /home partition instead of the one defined in fstab (/dev/hdd1).
So far I have changed the CD-R, the data ribbons, both hard-drives. I even added an exhaust fan (thinking the cpu might be overheating). No change. I even tried running one stick of ram at a time. No change.
Maybe I have bad ram? Or a bad CPU? Or a bad BIOS? Or a bad BIOS battery? The computer was running Windows XP before - could there be a virus in the BIOS?
What program do I use to check the CPU, RAM, etc for problems?
re: how to diagnose
Posts: 31
Have you noticed if the problems tend to occur more often when the PC is still cold, or after having been powered up for several minutes? Have you inspected the CPU's fan and heat sink? Does the fan spin, and are both free of accumulated dust? If the CPU is in a zif socket, have you checked whether the socket is locked? Have you tried setting the bios settings to all defaults? Do you have a spare power supply that you can swap with the one currently in your PC?
From your post it appears that you've reseated all the boards & cables in the PC. Do the memory sticks lock into place when you reseat them? Does this PC have both onboard video and a separate video card? If so, have you tried using just the onboard video? Have you tried changing positions of the PCI cards?
re: how to diagnose
Posts: 88
Could be a bad power supply. You might try disconnecting all the drives and cables except the CD and boot from the CD. That won't eleminate the power supply completly but if it is marginal it might run without problems on a lower load. Worth a try as bad power supplies can mimic a lot of other problems.
Your eyes and a good light...
Posts: 365
What you're describing sounds like what I've seen occur from bad capacitors on the motherboard.
A lot of Motherboards from the late 90's & early 00's had electrolytic capacitors on them that after a couple years bulged & leaked the electrolyte. The electrolyte is a white or yellowish caustic paste that eventually dries into small crystals. Take a flashlight & give the capacitors a good look (they're the small cylinders standing upright on the board). Any bulging or leakage and you might as well get a new motherboard.
Their purpose is to filter the electrical noise generated by the processor, Northbridge, SouthBridge, etc, out of the power sent to all the chips. As they go bad, the noise gets into all the chips and random glitches start happening. It just gets worse over time and is uneconomical to repair.

I agree, that sounds like a
Posts: 4077
I agree, that sounds like a bad motherboard. Do a memory check first time though.
--
Post questions on www.mepislovers.org too.
Check out our wiki: www.mepislovers-wiki.org

Thanks for the suggestions.
Posts: 299
Thanks for the suggestions. I will try some of these and hope for the best. Will post results soon...
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------

I've run into this problem
Posts: 1634
I've run into this problem plenty of times, and it can have a variety of causes. This is what I have ultimately found to be the root of the problem (from most-common to least-common):
Bad power supply (noisy supply rails or under rated for the machine). Confirm with an oscilloscope, and replace it.
Noise on the power supply mains. Confirm with an oscilloscope--making sure that the neutral-to-ground voltage is less than 0.6VAC, no DC-offset--and remediate power mains problems with a dedicated branch circuit or power line conditioner/UPS.
Bad filter caps on the mother board (bulged, leaking, or just dried out). Get your soldering iron out and replace all of them. (Way cheaper than a new MoBo.)
IRQ shared between NIC and video adapter. Re-sequence the PCI cards to enumerate differently--separating high-demand IRQs from one another.
Intermittent connections through an IDE cable. Replace it.
Bad memory module. Replace it.
Corroded copper traces on the motherboard. (The location had a mouse infestation, and the machine had a slot blanking plate missing. Mice were visiting the machine and urinating on the motherboard--corrosive stuff!) Replace the motherboard, of course.
I'm betting that you have a bad/underrated P/S or bad filter caps on the motherboard.
----------------------------

Well, I just replaced the
Posts: 299
Well, I just replaced the ram with two sticks that are known to be good. Still having problems.
The IRQ thing is a possibility. I am using a video card instead of the onboard video.
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------
Re: diagnose bad hardware
Posts: 88
I agree with everything EnigmaOne has said. I can tell he has worked on a lot of broken boxes. I would bet that the problem is either p-supply or bad caps.

Re: Well, I just replaced the
Posts: 565
Well, I just replaced the ram with two sticks that are known to be good. Still having problems.The IRQ thing is a possibility. I am using a video card instead of the onboard video.
============================
Well, Bob, you may have given the answer to your problem. First the added video card should be in the first slot, second the slot next to the video card should be empty as the first two slots share the same irq and video cards do not like to share. If that don't help you can take the computer completely apart including the power supply and give it all a good cleaning, an old tooth brush will help as well as a vacuum cleaner and then check those caps before putting it back together.
dmesg for starters
Posts: 613
Open a console and type dmesg. Look for errors and warnings.
Post 'em in this thread.
What model mobo? Can the bios be upgraded? Do you have a means of testing the power supply?
Are your mouse and keyboard ps2 or usb?
What distro/version/kernel are you running?

Well, one more possiblity
Posts: 299
Well, one more possiblity eliminated. I pulled the video card and used the onboard video. Nothing changed. I still get crashes and other weird behavior. So I put the video card back, following your instructions (video card in the first PCI slot, second slot empty, ethernet card in third slot).
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------

To answer a few of the
Posts: 299
To answer a few of the questions here:
Kernel 2.6 something.
ps2 keyboard
usb mouse
The system is an eMachines. I am still waiting for them to email with information about the motherboard.
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------

To answer a few of the
Posts: 565
Bob, I've been wanting to ask you if it will run from the cd ok or will it still crash? If it runs the cd ok then we know that eather the HD has a problem or the software. If the computer still crashes check cpu fan(it should run fast and smooth) and may be clean the top of the cpu & heatsink and apply new thermal grease. Something else, just because you know the RAM is good, that don't mean the RAM slots are clean, the same goes for all the rest of your slots and all the pins on all the chips, is the power supply blowing cool air, are the caps bulging. Are you overwhelmed by all this. 
regards,
Bad Dog
GNU/Linux User - INTEL P-IV 3.0GHz - 2X256 DDR3200
MEPISLite 3.3.2-1 - KDE 3.3.2 - Debian Sarge UpDates
Kernel 2.6.12-1-p4-smp - My iMAC runs Debian Sarge

Live CD works fine
Posts: 299
Live CD works fine (MepisLite, Mepis 3.4 and Linspire) but only mepisLite will install. The others fail. MEPIS 3.4 freezes part way thru and Linspire simply reboots without installing.
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------

What's really weird is...
Posts: 299
This is confusing. I was looking at dmesg and saw that MEPIS thinks I only have 128MB of ram. I know I have 384. So I rebooted and looked in the bios (again !!). BIOS does not see the 256MB stick in slot 0 but it does see the 128MB stick in slot 1. So I removed the stick from slot 1 and now suddenly the BIOS sees the 256MB stick in slot 0. Weird.
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------
Bad stick.
Posts: 613
This is confusing. I was looking at dmesg and saw that MEPIS thinks I only have 128MB of ram. I know I have 384.
A bad stick of RAM would explain all your problems (though it may have just come unseated). If I remember correctly, one can boot the Mepis cd and select 'memtest' in the boot options to test RAM.

Look closely at the ram pins...
Posts: 145
Do they all line up with each other? Any pins missing? Are we mixing double sided with single sided? (some older boards would get grumpy when mixing) Could be a funky socket, too. This kind of stuff drives me CRAZY, heh, heh.
Good luck,
The Tramp

Well,
Posts: 299
To follow up on this story-
I gave away the PC133 ram. The person I gave it to now reports all the same problems. Weird behavior, lock-ups, crashes, corrupted files, spontaneous reboots. He ran a mem-test and it returned hundreds of errors. The ram was bad for sure.
I have been using this motherboard for a while now with some old PC100 that I had and everything works fine. But I can only use one stick at a time installed. If I try to use more than one stick the weird behavior returns.
I am starting to think the BIOS is broken because most Linux distros will not install. Some lock up but most abort while uncompressing the kernel.
---------------------------------------
Bob L Hunter
bicycle tourist, bookworm, linux newbie
---------------------------------------
I have used this on hundreds of boxes...
Posts: 311

Is there a BIOS update available?
Posts: 145
Check the manufacturer's website to see what was fixed since your version.
The Tramp

Back to the ram
Posts: 959
Some older boards will run with single sided ram, but not well, while they run double sided ram very well.
Are you mixing single sided with double sided?
How about the notion the board has been subjected to static damage, a lightning strike or has been so overclocked that the sum total of all the minor errors is nothing but a massive headache?
Now something more useful.
Have you done a full bios reset?
Is there any damage to the tracks on the motherboard?
Bad e-caps may not be visible, but in time, they normally show up.
Expansion and contraction may cause the tiny copper contacts in your ram slots to disconnect.
Is the motherboard under tension (bent)
Is the case too felxible?
Do the pci cards look like they are properly seated?
Is it really worth it?

Re: Back to the ram
Posts: 1634
Is the motherboard under tension (bent)
That's something I forgot to mention above...thanks.
Had a MoBo that had an unsupported span--right under the DIMM sockets--that would spaz-out every time I changed memory modules. I eventually had to pull the board from the case and layer a couple of sheets of masonite onto the cabinet pan under the DIMM sockets.
Never a problem after that. Something to consider.
Is it really worth it?
Prolly not, but tossing the machine isn't much fun either. (grin)
my 5 year old 700irs
Posts: 7
my 5 year old 700irs emachines mobo supports 256MB of ram. However it will not recognize one 256 stick, it takes (2) 128 sticks to max it out. That's why Crucial's memory selector only listed memory up to 128 i found out.

Re: my 5 year old 700irs
Posts: 1634
my 5 year old 700irs emachines mobo supports 256MB of ram. However it will not recognize one 256 stick, it takes (2) 128 sticks to max it out. That's why Crucial's memory selector only listed memory up to 128 i found out.
That's pretty-much the norm.
Max_memory / number_of_DIMM-sockets = Max_DIMM_size
The only exception to this (that I've discovered, to date) is the Dell GX1/GXa MoBos; which will actually accept 256MB DIMMs--contrary to Dell's documentated per-DIMM socket maximums.
I suppose that it never hurts to try, but don't get your hopes up.
Home of the Point-N-Click Help Files ~ [url=http://www.catb.org/~esr/faqs/smart-questions.html]How To Ask Questions the Smart Way[/url]
I used the MEPIS
Posts: 209
I used the MEPIS LiveCD.
Finds most troubles, when booting the LiveCD choose the Memtest to check your memory.
You may need to leave it running for several hours to force the fault to show.
You may even have 2 or 3 things faulty, remove everything and put them back one at a time and leave it running off the LiveCD.
just some ideas.
NH