Knowing various Python modules for editing spreadsheets, downloading files, and launching programs is useful, but sometimes there just arent any modules for the applications you need to work with. The ultimate tools for automating tasks on your computer are programs you write that directly control the keyboard and mouse. These programs can control other applications by sending them virtual keystrokes and mouse clicks, justpython3- as if you were sitting at your computer and interacting with the applications yourself. This technique is known as graphical user interface automation, or GUI automation for short. With GUI automation, your programs can do anything that a human user sitting at the computer can do, except spill coffee on the keyboard.
Think of GUI automation as programming a robotic arm. You can program the robotic arm to type at your keyboard and move your mouse for you. This technique is particularly useful for tasks that involve a lot of mindless clicking or filling out of forms.
The pyautogui module has functions for simulating mouse movements, button clicks, and scrolling the mouse wheel. This chapter covers only a subset of PyAutoGUIs features; you can find the full documentation at http://pyautogui.readthedocs.org/.
The pyautogui module can send virtual keypresses and mouse clicks to Windows, OS X, and Linux. Depending on which operating system youre using, you may have to install some other modules (called dependencies) before you can install PyAutoGUI.
On Windows, there are no other modules to install.
On OS X, run sudo pip3 install pyobjc-framework-Quartz, sudo pip3 install pyobjc-core, and then sudo pip3 install pyobjc.
On Linux, run sudo pip3 install python3-xlib, sudo apt-get install scrot, sudo apt-get install python3-tk, and sudo apt-get install python3-dev. (Scrot is a screenshot program that PyAutoGUI uses.)
After these dependencies are installed, run pip install pyautogui (or pip3 on OS X and Linux) to install PyAutoGUI.
Appendix A has complete information on installing third-party modules. To test whether PyAutoGUI has been installed correctly, run import pyautogui from the interactive shell and check for any error messages.
Before you jump in to a GUI automation, you should know how to escape problems that may arise. Python can move your mouse and type keystrokes at an incredible speed. In fact, it might be too fast for other programs to keep up with. Also, if something goes wrong but your program keeps moving the mouse around, it will be hard to tell what exactly the program is doing or how to recover from the problem. Like the enchanted brooms from Disneys The Sorcerers Apprentice, which kept fillingand then overfillingMickeys tub with water, your program could get out of control even though its following your instructions perfectly. Stopping the program can be difficult if the mouse is moving around on its own, preventing you from clicking the IDLE window to close it. Fortunately, there are several ways to prevent or recover from GUI automation problems.
Perhaps the simplest way to stop an out-of-control GUI automation program is to log out, which will shut down all running programs. On Windows and Linux, the logout hotkey is CTRL-ALT-DEL. On OS X, it is -SHIFT-OPTION-Q. By logging out, youll lose any unsaved work, but at least you wont have to wait for a full reboot of the computer.
You can tell your script to wait after every function call, giving you a short window to take control of the mouse and keyboard if something goes wrong. To do this, set the pyautogui.PAUSE variable to the number of seconds you want it to pause. For example, after setting pyautogui.PAUSE = 1.5, every PyAutoGUI function call will wait one and a half seconds after performing its action. Non-PyAutoGUI instructions will not have this pause.
PyAutoGUI also has a fail-safe feature. Moving the mouse cursor to the upper-left corner of the screen will cause PyAutoGUI to raise the pyautogui.FailSafeException exception. Your program can either handle this exception with try and except statements or let the exception crash your program. Either way, the fail-safe feature will stop the program if you quickly move the mouse as far up and left as you can. You can disable this feature by setting pyautogui.FAILSAFE = False. Enter the following into the interactive shell:
Here we import pyautogui and set pyautogui.PAUSE to 1 for a one-second pause after each function call. We set pyautogui.FAILSAFE to True to enable the fail-safe feature.
In this section, youll learn how to move the mouse and track its position on the screen using PyAutoGUI, but first you need to understand how PyAutoGUI works with coordinates.
The mouse functions of PyAutoGUI use x- and y-coordinates. Figure18-1 shows the coordinate system for the computer screen; its similar to the coordinate system used for images, discussed in Chapter17. The origin, where x and y are both zero, is at the upper-left corner of the screen. The x-coordinates increase going to the right, and the y-coordinates increase going down. All coordinates are positive integers; there are no negative coordinates.
Figure18-1.The coordinates of a computer screen with 19201080 resolution
Your resolution is how many pixels wide and tall your screen is. If your screens resolution is set to 19201080, then the coordinate for the upper-left corner will be (0, 0), and the coordinate for the bottom-right corner will be (1919, 1079).
The pyautogui.size() function returns a two-integer tuple of the screens width and height in pixels. Enter the following into the interactive shell:
pyautogui.size() returns (1920, 1080) on a computer with a 19201080 resolution; depending on your screens resolution, your return value may be different. You can store the width and height from pyautogui.size() in variables like width and height for better readability in your programs.
Now that you understand screen coordinates, lets move the mouse. The pyautogui.moveTo() function will instantly move the mouse cursor to a specified position on the screen. Integer values for the x- and y-coordinates make up the functions first and second arguments, respectively. An optional duration integer or float keyword argument specifies the number of seconds it should take to move the mouse to the destination. If you leave it out, the default is 0 for instantaneous movement. (All of the duration keyword arguments in PyAutoGUI functions are optional.) Enter the following into the interactive shell:
This example moves the mouse cursor clockwise in a square pattern among the four coordinates provided a total of ten times. Each movement takes a quarter of a second, as specified by the duration=0.25 keyword argument. If you hadnt passed a third argument to any of the pyautogui.moveTo() calls, the mouse cursor would have instantly teleported from point to point.
The pyautogui.moveRel() function moves the mouse cursor relative to its current position. The following example moves the mouse in the same square pattern, except it begins the square from wherever the mouse happens to be on the screen when the code starts running:
pyautogui.moveRel() also takes three arguments: how many pixels to move horizontally to the right, how many pixels to move vertically downward, and (optionally) how long it should take to complete the movement. A negative integer for the first or second argument will cause the mouse to move left or upward, respectively.
You can determine the mouses current position by calling the pyautogui.position() function, which will return a tuple of the mouse cursors x and y positions at the time of the function call. Enter the following into the interactive shell, moving the mouse around after each call:
Of course, your return values will vary depending on where your mouse cursor is.
Being able to determine the mouse position is an important part of setting up your GUI automation scripts. But its almost impossible to figure out the exact coordinates of a pixel just by looking at the screen. It would be handy to have a program that constantly displays the x- and y-coordinates of the mouse cursor as you move it around.
At a high level, heres what your program should do:
This means your code will need to do the following:
Call the position() function to fetch the current coordinates.
Erase the previously printed coordinates by printing b backspace characters to the screen.
Handle the KeyboardInterrupt exception so the user can press CTRL-C to quit.
Open a new file editor window and save it as mouseNow.py.
Start your program with the following:
The beginning of the program imports the pyautogui module and prints a reminder to the user that they have to press CTRL-C to quit.
You can use an infinite while loop to constantly print the current mouse coordinates from mouse.position(). As for the code that quits the program, youll need to catch the KeyboardInterrupt exception, which is raised whenever the user presses CTRL-C. If you dont handle this exception, it will display an ugly traceback and error message to the user. Add the following to your program:
To handle the exception, enclose the infinite while loop in a try statement. When the user presses CTRL-C, the program execution will move to the except clause and Done. will be printed in a new line .
The code inside the while loop should get the current mouse coordinates, format them to look nice, and print them. Add the following code to the inside of the while loop:
Using the multiple assignment trick, the x and y variables are given the values of the two integers returned in the tuple from pyautogui.position(). By passing x and y to the str() function, you can get string forms of the integer coordinates. The rjust() string method will right-justify them so that they take up the same amount of space, whether the coordinate has one, two, three, or four digits. Concatenating the right-justified string coordinates with 'X: ' and ' Y: ' labels gives us a neatly formatted string, which will be stored in positionStr.
At the end of your program, add the following code:
This actually prints positionStr to the screen. The end='' keyword argument to print() prevents the default newline character from being added to the end of the printed line. Its possible to erase text youve already printed to the screenbut only for the most recent line of text. Once you print a newline character, you cant erase anything printed before it.
To erase text, print the b backspace escape character. This special character erases a character at the end of the current line on the screen. The line at uses string replication to produce a string with as many b characters as the length of the string stored in positionStr, which has the effect of erasing the positionStr string that was last printed.
For a technical reason beyond the scope of this book, always pass flush=True to print() calls that print b backspace characters. Otherwise, the screen might not update the text as desired.
Since the while loop repeats so quickly, the user wont actually notice that youre deleting and reprinting the whole number on the screen. For example, if the x-coordinate is 563 and the mouse moves one pixel to the right, it will look like only the 3 in 563 is changed to a 4.
When you run the program, there will be only two lines printed. They should look like something like this:
The first line displays the instruction to press CTRL-C to quit. The second line with the mouse coordinates will change as you move the mouse around the screen. Using this program, youll be able to figure out the mouse coordinates for your GUI automation scripts.
Now that you know how to move the mouse and figure out where it is on the screen, youre ready to start clicking, dragging, and scrolling.
To send a virtual mouse click to your computer, call the pyautogui.click() method. By default, this click uses the left mouse button and takes place wherever the mouse cursor is currently located. You can pass x- and y-coordinates of the click as optional first and second arguments if you want it to take place somewhere other than the mouses current position.
If you want to specify which mouse button to use, include the button keyword argument, with a value of 'left', 'middle', or 'right'. For example, pyautogui.click(100, 150, button='left') will click the left mouse button at the coordinates (100, 150), while pyautogui.click(200, 250, button='right') will perform a right-click at (200, 250).
Enter the following into the interactive shell:
You should see the mouse pointer move to near the top-left corner of your screen and click once. A full click is defined as pushing a mouse button down and then releasing it back up without moving the cursor. You can also perform a click by calling pyautogui.mouseDown(), which only pushes the mouse button down, and pyautogui.mouseUp(), which only releases the button. These functions have the same arguments as click(), and in fact, the click() function is just a convenient wrapper around these two function calls.
As a further convenience, the pyautogui.doubleClick() function will perform two clicks with the left mouse button, while the pyautogui.rightClick() and pyautogui.middleClick() functions will perform a click with the right and middle mouse buttons, respectively.
Dragging means moving the mouse while holding down one of the mouse buttons. For example, you can move files between folders by dragging the folder icons, or you can move appointments around in a calendar app.
PyAutoGUI provides the pyautogui.dragTo() and pyautogui.dragRel() functions to drag the mouse cursor to a new location or a location relative to its current one. The arguments for dragTo() and dragRel() are the same as moveTo() and moveRel(): the x-coordinate/horizontal movement, the y-coordinate/vertical movement, and an optional duration of time. (OS X does not drag correctly when the mouse moves too quickly, so passing a duration keyword argument is recommended.)
To try these functions, open a graphics-drawing application such as Paint on Windows, Paintbrush on OS X, or GNU Paint on Linux. (If you dont have a drawing application, you can use the online one at http://sumopaint.com/.) I will use PyAutoGUI to draw in these applications.
With the mouse cursor over the drawing applications canvas and the Pencil or Brush tool selected, enter the following into a new file editor window and save it as spiralDraw.py:
When you run this program, there will be a five-second delay for you to move the mouse cursor over the drawing programs window with the Pencil or Brush tool selected. Then spiralDraw.py will take control of the mouse and click to put the drawing program in focus . A window is in focus when it has an active blinking cursor, and the actions you takelike typing or, in this case, dragging the mousewill affect that window. Once the drawing program is in focus, spiralDraw.py draws a square spiral pattern like the one in Figure18-2.
Figure18-2.The results from the pyautogui.dragRel() example
The distance variable starts at 200, so on the first iteration of the while loop, the first dragRel() call drags the cursor 200 pixels to the right, taking 0.2 seconds . distance is then decreased to 195 , and the second dragRel() call drags the cursor 195 pixels down . The third dragRel() call drags the cursor 195 horizontally (195 to the left) , distance is decreased to 190, and the last dragRel() call drags the cursor 190 pixels up. On each iteration, the mouse is dragged right, down, left, and up, and distance is slightly smaller than it was in the previous iteration. By looping over this code, you can move the mouse cursor to draw a square spiral.
You could draw this spiral by hand (or rather, by mouse), but youd have to work slowly to be so precise. PyAutoGUI can do it in a few seconds!
You could have your code draw the image using the pillow modules drawing functionssee Chapter17 for more information. But using GUI automation allows you to make use of the advanced drawing tools that graphics programs can provide, such as gradients, different brushes, or the fill bucket.
The final PyAutoGUI mouse function is scroll(), which you pass an integer argument for how many units you want to scroll the mouse up or down. The size of a unit varies for each operating system and application, so youll have to experiment to see exactly how far it scrolls in your particular situation. The scrolling takes place at the mouse cursors current position. Passing a positive integer scrolls up, and passing a negative integer scrolls down. Run the following in IDLEs interactive shell while the mouse cursor is over the IDLE window:
Youll see IDLE briefly scroll upwardand then go back down. The downward scrolling happens because IDLE automatically scrolls down to the bottom after executing an instruction. Enter this code instead:
This imports pyperclip and sets up an empty string, numbers. The code then loops through 200 numbers and adds each number to numbers, along with a newline. After pyperclip.copy(numbers), the clipboard will be loaded with 200 lines of numbers. Open a new file editor window and paste the text into it. This will give you a large text window to try scrolling in. Enter the following code into the interactive shell:
On the second line, you enter two commands separated by a semicolon, which tells Python to run the commands as if they were on separate lines. The only difference is that the interactive shell wont prompt you for input between the two instructions. This is important for this example because we want to the call to pyautogui.scroll() to happen automatically after the wait. (Note that while putting two commands on one line can be useful in the interactive shell, you should still have each instruction on a separate line in your programs.)
After pressing ENTER to run the code, you will have five seconds to click the file editor window to put it in focus. Once the pause is over, the pyautogui.scroll() call will cause the file editor window to scroll up after the five-second delay.
Your GUI automation programs dont have to click and type blindly. PyAutoGUI has screenshot features that can create an image file based on the current contents of the screen. These functions can also return a Pillow Image object of the current screens appearance. If youve been skipping around in this book, youll want to read Chapter17 and install the pillow module before continuing with this section.
On Linux computers, the scrot program needs to be installed to use the screenshot functions in PyAutoGUI. In a Terminal window, run sudo apt-get install scrot to install this program. If youre on Windows or OS X, skip this step and continue with the section.
To take screenshots in Python, call the pyautogui.screenshot() function. Enter the following into the interactive shell:
The im variable will contain the Image object of the screenshot. You can now call methods on the Image object in the im variable, just like any other Image object. Enter the following into the interactive shell:
Pass getpixel() a tuple of coordinates, like (0, 0) or (50, 200), and itll tell you the color of the pixel at those coordinates in your image. The return value from getpixel() is an RGB tuple of three integers for the amount of red, green, and blue in the pixel. (There is no fourth value for alpha, because screenshot images are fully opaque.) This is how your programs can see what is currently on the screen.
Say that one of the steps in your GUI automation program is to click a gray button. Before calling the click() method, you could take a screenshot and look at the pixel where the script is about to click. If its not the same gray as the gray button, then your program knows something is wrong. Maybe the window moved unexpectedly, or maybe a pop-up dialog has blocked the button. At this point, instead of continuingand possibly wreaking havoc by clicking the wrong thingyour program can see that it isnt clicking on the right thing and stop itself.
PyAutoGUIs pixelMatchesColor() function will return True if the pixel at the given x- and y-coordinates on the screen matches the given color. The first and second arguments are integers for the x- and y-coordinates, and the third argument is a tuple of three integers for the RGB color the screen pixel must match. Enter the following into the interactive shell:
After taking a screenshot and using getpixel() to get an RGB tuple for the color of a pixel at specific coordinates , pass the same coordinates and RGB tuple to pixelMatchesColor() , which should return True. Then change a value in the RGB tuple and call pixelMatchesColor() again for the same coordinates . This should return false. This method can be useful to call whenever your GUI automation programs are about to call click(). Note that the color at the given coordinates must exactly match. If it is even slightly differentfor example, (255, 255, 254) instead of (255, 255, 255)then pixelMatchesColor() will return False.
You could extend the mouseNow.py project from earlier in this chapter so that it not only gives the x- and y-coordinates of the mouse cursors current position but also gives the RGB color of the pixel under the cursor. Modify the code inside the while loop of mouseNow.py to look like this:
Now, when you run mouseNow.py, the output will include the RGB color value of the pixel under the mouse cursor.
This information, along with the pixelMatchesColor() function, should make it easy to add pixel color checks to your GUI automation scripts.
But what if you do not know beforehand where PyAutoGUI should click? You can use image recognition instead. Give PyAutoGUI an image of what you want to click and let it figure out the coordinates.
For example, if you have previously taken a screenshot to capture the image of a Submit button in submit.png, the locateOnScreen() function will return the coordinates where that image is found. To see how locateOnScreen() works, try taking a screenshot of a small area on your screen; then save the image and enter the following into the interactive shell, replacing 'submit. png' with the filename of your screenshot:
The four-integer tuple that locateOnScreen() returns has the x-coordinate of the left edge, the y-coordinate of the top edge, the width, and the height for the first place on the screen the image was found. If youre trying this on your computer with your own screenshot, your return value will be different from the one shown here.
If the image cannot be found on the screen, locateOnScreen() will return None. Note that the image on the screen must match the provided image perfectly in order to be recognized. If the image is even a pixel off, locateOnScreen() will return None.
If the image can be found in several places on the screen, locateAllOnScreen() will return a Generator object, which can be passed to list() to return a list of four-integer tuples. There will be one four-integer tuple for each location where the image is found on the screen. Continue the interactive shell example by entering the following (and replacing 'submit.png' with your own image filename):
Each of the four-integer tuples represents an area on the screen. If your image is only found in one area, then using list() and locateAllOnScreen() just returns a list containing one tuple.
Once you have the four-integer tuple for the area on the screen where your image was found, you can click the center of this area by passing the tuple to the center() function to return x- and y-coordinates of the areas center. Enter the following into the interactive shell, replacing the arguments with your own filename, four-integer tuple, and coordinate pair:
Once you have center coordinates from center(), passing the coordinates to click() should click the center of the area on the screen that matches the image you passed to locateOnScreen().
PyAutoGUI also has functions for sending virtual keypresses to your computer, which enables you to fill out forms or enter text into applications.
The pyautogui.typewrite() function sends virtual keypresses to the computer. What these keypresses do depends on what window and text field have focus. You may want to first send a mouse click to the text field you want in order to ensure that it has focus.
As a simple example, lets use Python to automatically type the words Hello world! into a file editor window. First, open a new file editor window and position it in the upper-left corner of your screen so that PyAutoGUI will click in the right place to bring it into focus. Next, enter the following into the interactive shell:
Notice how placing two commands on the same line, separated by a semicolon, keeps the interactive shell from prompting you for input between running the two instructions. This prevents you from accidentally bringing a new window into focus between the click() and typewrite() calls, which would mess up the example.
Python will first send a virtual mouse click to the coordinates (100, 100), which should click the file editor window and put it in focus. The typewrite() call will send the text Hello world! to the window, making it look like Figure18-3. You now have code that can type for you!
Figure18-3.Using PyAutogGUI to click the file editor window and type Hello world! into it
By default, the typewrite() function will type the full string instantly. However, you can pass an optional second argument to add a short pause between each character. This second argument is an integer or float value of the number of seconds to pause. For example, pyautogui.typewrite('Hello world!', 0.25) will wait a quarter-second after typing H, another quarter-second after e, and so on. This gradual typewriter effect may be useful for slower applications that cant process keystrokes fast enough to keep up with PyAutoGUI.
For characters such as A or !, PyAutoGUI will automatically simulate holding down the SHIFT key as well.
Not all keys are easy to represent with single text characters. For example, how do you represent SHIFT or the left arrow key as a single character? In PyAutoGUI, these keyboard keys are represented by short string values instead: 'esc' for the ESC key or 'enter' for the ENTER key.
Instead of a single string argument, a list of these keyboard key strings can be passed to typewrite(). For example, the following call presses the A key, then the B key, then the left arrow key twice, and finally the X and Y keys:
Because pressing the left arrow key moves the keyboard cursor, this will output XYab. Table18-1 lists the PyAutoGUI keyboard key strings that you can pass to typewrite() to simulate pressing any combination of keys.
You can also examine the pyautogui.KEYBOARD_KEYS list to see all possible keyboard key strings that PyAutoGUI will accept. The 'shift' string refers to the left SHIFT key and is equivalent to 'shiftleft'. The same applies for 'ctrl', 'alt', and 'win' strings; they all refer to the left-side key.
Table18-1.PyKeyboard Attributes
Keyboard key string
Meaning
Link:
Automate the Boring Stuff with Python