Friday, February 4, 2011

Arithmetic underflow or overflow exception during debugging

This is the day of weird behavior.

We have a Win32 project made with Delphi 2007, which hosts the .NET runtime and calls into .NET to show new forms, as part of a transition period.

Recently we've begun experiencing exceptions at seemingly random locations and points of our code: Arithmetic overflow or underflow.

The stack trace of one of these looks like this:

at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(Int32 dwComponentID, Int32 reason, Int32 pvLoopData)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.RunDialog(Form form)
at System.Windows.Forms.Form.ShowDialog(IWin32Window owner)
at System.Windows.Forms.Form.ShowDialog()
at Gatsoft.Gat.UI.Windows.Forms.Remanaging.RemanageForm.DelphiOpenInNewMode(String employeeCode, String departmentCode, DateTime date) in C:\Dev\VS.NET\Gatsoft\Gatsoft.Gat.UI.Windows\Forms\Remanaging\RemanageForm.Delphi.cs:line 67

In the Visual Studio solution, one of the outmost class libraries (ie. pulls in all the references it can), has set a specific debug program, targetted for the Delphi project output. This allows us to debug .NET code from Visual Studio, even though the main bulk of the program is written in Delphi.

The problem only occurs when run from the debugger, not if we just run the exe file directly (either through explorer, shortcuts, or even Ctrl+F5 inside Visual Studio).

There's apparently no spyware on the machine (as hinted by this).

Any other things we can check?


Edit: It looks like the .NET debugger is enabling this SNaN flags, and the Delphi debugger does not. We'll have to investigate this further, but for now I'll accept @Lorenzo Boccaccia's answer.

Apparently Solved

Ok, it looks like we've finally nailed this problem. The problem started occuring without having the debugger attached as well, for our testers, so we had to prioritize the problem way up.

Finally we found one common issue with the machines that had the problem, they are Dell Lattitude D620 laptops with an NVIDIA Quadro NVS 110M, with an old driver from a system image used to provision the laptops, from back in 2006.

I found one post on the web, though I lost the url when I rebooted to update the display driver, that had a .NET service crashing, mostly when the machine was busy doing something on the screen. One way to reproduce his problem was to open a command prompt to C:\ and doing a DIR /S to just force a massive amount of screen updates, which would trigger the crash.

He too had a NVIDIA video card.

The problem on my machine occured roughly every 2-4 startups of our program, but after updating the video driver I've had 123 successfull startups without any problems. (BTW I can recommend AutoHotKey for such things).

So it looks like we've found the culprit, an old/buggy NVIDIA driver.

Updated this question so that perhaps someone in the future can save some time.

Now, if you'll excuse me, I'm going to go cry in a corner.

Jinxed!

I must've jinxed it. No sooner had I posted the above update than a colleague laptop failed, after updating the video driver.

Still, I'm positive it's a problem outside of our application now, so it just remains to figure out which specific things to update.


From stackoverflow Lasse V. Karlsen

  • Do the errors occur still occur if you attach the debugger after starting the application?

    crosstalk : The reason this matters is that processes started by the debugger use a special "debug heap". See http://blogs.msdn.com/larryosterman/archive/2008/09/03/anatomy-of-a-heisenbug.aspx or http://msdn.microsoft.com/en-us/library/974tc9t1.aspx .
    Lasse V. Karlsen : Thanks to both of you, I'll check tomorrow. It is indeed a heisenbug, and a random one at that, as I can get a few debug cycles under way before it exhibits itself again.
    Lasse V. Karlsen : only if I start through the debugger, and only the .NET debugger.
    From rpetrich
  • a debug version of a linked dll could be compiled with signaling nan support, see http://blogs.msdn.com/oldnewthing/archive/2008/07/02/8679191.aspx for an example of this problem.

    that heisenbug was caused by uninitialized variables, here there could be a linked dll enabling the snan feature of the cpu and forgetting to disable it upon returning

0 comments:

Post a Comment