How a modal window could break the event cycle: A debugging story

A couple of weeks ago I started an aggressive cleaning on Morphic, the graphics framework we currently use in Pharo, searching for a cheap and simple path to move the Pharo IDE to HiDPI screens. The cleanups involved mostly the removal of global state. For example, I transformed code like the following:

ActiveHand doSomething.

into something like:

self activeHand doSomething.

Morph >> activeHand [
  ^ self activeWorld activeHand
]

Turns out in the middle I generated (or actually unveiled?) a couple of bugs :). Yet, I was actually surprised that I did not generate MORE bugs. So this post is a story about debugging. Because I think we don’t discuss debugging enough, and inexperienced developers often have problems to find good strategies to debug.

The bug(z)

About two weeks ago I did two pull requests to Pharo cleaning a bit the mess on Morphic’s global state. These were PR 5916 and PR 5926. A week later a couple of bugs started to pop up. This story is about issue 5945 and issue 6003.

The insect in question was showing itself as a problem managing modal windows and events. When trying to rename a method protocol a pop-up opened up, and then no matter where you clicked it kept in popping-up again and again and again.

Original screenshot showing the problem by @Myroslava.
Taken from issue https://github.com/pharo-project/pharo/issues/5945
Original screenshot showing the problem by @Myroslava.
Taken from issue https://github.com/pharo-project/pharo/issues/5945

Finding the bug in the Dark

If you follow the discussion in the issue, you’ll find my first approach for it was to try reproduce it. And I couldn’t. Having an unreproducible bug is kind of a hazzle, because it makes me ask lots of questions, and not necessarily in the good direction. Is the bug dependent on @Myroslava’s environment? Is it Windows? Because I was on OSX. Was it that she was using an old(ish) Pharo version? Because I was in the latest dev one. A couple of ping-pong of questions gave me the information I needed, but I felt that this kind of information is needed for all bugs, and it’s really a time saver when you’re chasing a nasty cochroach.

That day (was it that day? I don’t remember, but allow me take some poetical licence here) I set-up a couple of github’s issue templates. And I’m happy a couple of weeks later to see that issue reports have got better 🙂

Now, coming back to the story, I had to explore under the carpet to find the bug, and I searched in many different places. First I tried comparing the behavior between the button in the status bar and the context menu. Then I tried to compare the behavior between the calypso browser and the message browser. Then within calypso, I compared the class search dialog with the offending dialog. This looked like a good approach, because the class search dialog was not showing the issue. But it turned out a dead end, because both dialogs are implemented in a **completely different* way… In the middle of the exploration, I saw the bug a couple of times, but I could not grasp what was causing it in some cases, and not in others.

I needed a reliable way to trigger it.

The epiphany

Then I got the “House MD” moment. The “Ah” that pops up in your head. But no vicodin involved. It turned out, that the bug only appeared from the control in the status bar.

And that control has three sub controls. A delete icon button, an edit icon button, and a label. The offending popup was being created when clicking the edit icon button, and when clicking on the label. The bug always happened when clicking on the label. Even more! The bug only happened when clicking on the label.

Gotcha!

A strategy to quickly catch the bug

Now I knew how to make the mouse go out of the wall. However, what we were seeing until now was just a bug symptom. I was not able to tell for sure if this bug was specific to this specific control in the calypso browser that was unveiled by my refactoring, if this was a more generalized problem, or if I did something wrong during my early refactorings. I did not know. So I had to find the underlying cause of the bug.

But I had two weapons at hand. I had two cases opening the same modal pop-up, one showing the bug an another one not showing it. So I proceeded to dissect them, and compare them. First attempt: I found the **single** place where the pop-up was being open, and I logged the stack at that point, to see if and how executions differ.

thisContext stack traceCr.

Turns out this first attempt gave me enough information to hypothetize about the issue! Let’s see the two stacks, first the one showing the bug (see it starts from LabelMorph >> click:), the second one not showing the bug.

ClyQueryBrowserContext(ClySystemBrowserContext)>>requestSingleMethodTag:suggesting: ClyMethodTagsAndPackageEditorMorph>>requestTag [
		self isExtensionActive 
			ifTrue: [ self requestPackage]
			ifFalse: [ self requestTag ]
	] in ClyMethodTagsAndPackageEditorMorph>>openEditor
BlockClosure>>on:do:
ClyMethodTagsAndPackageEditorMorph>>requestChangeBy:
ClyMethodTagsAndPackageEditorMorph>>openEditor
MorphEventSubscription>>notify:from: [ :s | result := result | ((s notify: anEvent from: sourceMorph) == true) ] in MorphicEventHandler>>notifyMorphsOfEvent:ofType:from: Set>>do: MorphicEventHandler>>notifyMorphsOfEvent:ofType:from:
MorphicEventHandler>>click:fromMorph:
LabelMorph(Morph)>>click:
MouseClickState>>click
MouseClickState>>handleEvent:from:
HandMorph>>handleEvent:
ClyQueryBrowserContext(ClySystemBrowserContext)>>requestSingleMethodTag:suggesting: ClyMethodTagsAndPackageEditorMorph>>requestTag [
		self isExtensionActive 
			ifTrue: [ self requestPackage]
			ifFalse: [ self requestTag ]
	] in ClyMethodTagsAndPackageEditorMorph>>openEditor
BlockClosure>>on:do:
ClyMethodTagsAndPackageEditorMorph>>requestChangeBy:
ClyMethodTagsAndPackageEditorMorph>>openEditor
[target perform: actionSelector withArguments: arguments] in IconicButton(SimpleButtonMorph)>>doButtonAction
BlockClosure>>ensure:
CursorWithMask(Cursor)>>showWhile:
IconicButton(SimpleButtonMorph)>>doButtonAction
IconicButton(SimpleButtonMorph)>>mouseUp:
IconicButton(Morph)>>handleMouseUp:
MouseButtonEvent>>sentTo:
IconicButton(Morph)>>handleEvent:
IconicButton(Morph)>>handleFocusEvent: [
		result := focusHolder handleFocusEvent: 
			(anEvent transformedBy: (focusHolder transformedFrom: self)).
	] in HandMorph>>sendFocusEvent:to:clear:
BlockClosure>>on:do:
WorldMorph>>becomeActiveDuring:
HandMorph>>sendFocusEvent:to:clear:
HandMorph>>sendEvent:focus:clear:
HandMorph>>sendMouseEvent:
HandMorph>>handleEvent:
MouseClickState>>handleEvent:from:
HandMorph>>handleEvent:

Notice that both stacks start from handleEvent:, but one ends up sending a mouseUp: while the other one sends a click: event, and this MouseClickState guy in the middle orchestrating everything with the HandMorph

Some background: how mouse events work

To fix this bug, I had to get my hands dirty and get some details about how mouse events work in Morphic. For those not interested on such details, skip this section, although I think it’s pretty educative ^^.

Morph worlds have hand objects, objects responsible of (among others) dispatching the events to the other morphs in the world.
In each UI cycle, a hand takes the events from a queue of pending OS events, generates Morphic events for the OS event, and dispatches them to the right guy.

Now, the mouse events as they come from the OS have two main flavours: mouseDown and mouseUp. There is no click, nor double click event. They are simulated by the hand.

The process of simulating a click works roughly as follows. When a morph is interested on a click it subscribes to mouse down events. Then, when it receives a mouse down, it has to ask the hand “Hey! I would like clicks!”.
Usually this is done by some code like this:

mouseDown: anEvent
  ...
  anEvent hand waitForClicksOrDrag: h
    event: (anEvent transformedBy: tfm)
    selectors: { nil. nil. nil. #dragTarget:. }
    threshold: 5
  ...

When that happens, the hand changes its state, and stores a MouseClickState object that implements a kind of state machine to detect clicks and double clicks.

Finally, after the mouseDown the morph receives a mouseUp (that should be transformed into a click). The hand, before sending the event to the interested morph, it makes it pass through the state machine which has some code like this:

"Oh, the guy was interested in clicks, and gave me a mouse up following a mouse down in some small enough time interval. Let's click!"
self click.    
aHand handleEvent: firstClickUp.

Where

  1. click will send a click event to the morph and
  2. handleEvent: will send the mouseUp

Fixing the bug

Finding the real fix took more time than the couple of minutes you will spend reading this post. My first fix for it was to invert handleEvent: and click. This fixed the issue in question, but lead to other problems like this one, because now the order of events was altered (and some apps were expecting clicks before mouse ups. I’ll just go to the juicy details.

The problem was caused because during the click event, calypso was opening a modal window. Modal windows do suspend the current execution and do not return the control to the caller until the window is closed. Something like this:

openModal
  self open.
  "Start a sub UI loop that will not return the control to the caller until I'm closed."
  [ self isOpen ] whileTrue: [ self doAnotherUIIteration ].

This meant that the call from click never returned. Thus the mouse up was never dispatched using handleEvent:. And when the sub UI loop starts, the morph with the focus starts receiving new events while the mouse up has never been sent, and the MouseClickState state machine thinks it is still in a click state…

I proposed a solution for this in this pull request. The main idea behind is that the event loop should always dispatch a single event per cycle, like that modal sub-cycles will start with a correct state. So, if like in this case an event (mouse up) generates several events (a click, then a mouseUp), only the first one is dispatched, and the second one is returned to the event queue, for later treatment in a subsequent cycle.

Some conclusion?

Please take care of your bug reports, they are life savers :).

The idea behind comparing executions is super powerful. We have a super smart guy in the team doing a PhD around this topic. This post just shows how to exploit this idea without other tool support than stack access and a log (thisContext stack traceCr).

MouseClickState probably needs more testing and love.

And I hope you had fun ^^

Published by Guille Polito

Pharo dev. Researcher, engineer and father. > If it ain't tested, it does not exist.

Leave a comment