Documentation
¶
Overview ¶
Package gopilot provides a simple and minimalistic API for automating Chromium browsers.
gopilot is a lightweight alternative to complex browser automation tools, focusing on essential functionality using the Chrome DevTools Protocol (CDP). It's structured around three main components: Browser (manages instances), Page (represents tabs), and Element (represents DOM elements).
Key features include navigation, DOM manipulation, element interaction, screenshots, and network request monitoring. The package supports both headful (default) and headless modes, and can be configured via environment variables like GOPILOT_CHROME_EXECUTABLE.
Common use cases include web scraping, UI testing, form automation, and taking screenshots.
For examples and detailed usage, see: https://github.com/falmar/gopilot/tree/main/examples
Index ¶
- Variables
- type BoundingRect
- type Browser
- type BrowserConfig
- type BrowserGetPagesInput
- type BrowserGetPagesOutput
- type BrowserNewPageInput
- type BrowserNewPageOutput
- type BrowserOpenInput
- type ClearLocalStorageInput
- type ClearLocalStorageOutput
- type DispatchEventType
- type Element
- type ElementClickInput
- type ElementClickOutput
- type ElementDOM
- type ElementInput
- type ElementScrollIntoViewInput
- type ElementScrollIntoViewOutput
- type ElementTakeScreenshotInput
- type ElementTakeScreenshotOutput
- type GetCookiesInput
- type GetCookiesOutput
- type GetLocalStorageInput
- type GetLocalStorageOutput
- type InterceptRequestCallback
- type InterceptRequestHandle
- type InterceptResponseCallback
- type InterceptResponseHandle
- type LocalStorageItem
- type Page
- type PageCookie
- type PageDOM
- type PageEvaluateInput
- type PageEvaluateOutput
- type PageFetch
- type PageInput
- type PageNavigateInput
- type PageNavigateOutput
- type PageNavigation
- type PageQuerySelectorInput
- type PageQuerySelectorOutput
- type PageReloadInput
- type PageReloadOutput
- type PageSearchInput
- type PageSearchOutput
- type PageStorage
- type PageTakeScreenshotInput
- type PageTakeScreenshotOutput
- type PageTypeTextInput
- type PageTypeTextOutput
- type SetCookiesInput
- type SetCookiesOutput
- type SetLocalStorageInput
- type SetLocalStorageOutput
- type TypeDelayFunc
- type XHREvent
- type XHRMonitor
Constants ¶
This section is empty.
Variables ¶
var ( ErrElementNotFound = errors.New("page search: element not found") ErrElementSearchTimeout = errors.New("page search: timeout") )
Functions ¶
This section is empty.
Types ¶
type BoundingRect ¶
type BoundingRect struct {
// Top is the distance from the top of the viewport to the top of the element.
Top float64 `json:"top"`
// Left is the distance from the left of the viewport to the left of the element.
Left float64 `json:"left"`
// Bottom is the distance from the top of the viewport to the bottom of the element.
Bottom float64 `json:"bottom"`
// Right is the distance from the left of the viewport to the right of the element.
Right float64 `json:"right"`
// X is the horizontal coordinate of the element.
X float64 `json:"x"`
// Y is the vertical coordinate of the element.
Y float64 `json:"y"`
// Width is the width of the element.
Width float64 `json:"width"`
// Height is the height of the element.
Height float64 `json:"height"`
// CenterX is the centered position on x-axis
CenterX float64
// CenterY is the centered position on y-axis
CenterY float64
}
BoundingRect represents the bounding box of an element on the page. It contains the coordinates of the edges and dimensions of the element.
type Browser ¶
type Browser interface {
// Open initiates a new browser session.
// It takes a context and BrowserOpenInput as parameters.
// Returns an error if the browser fails to start.
Open(ctx context.Context, in *BrowserOpenInput) error
// NewPage creates a new page or tab in the browser.
// Accepts context and BrowserNewPageInput to specify creation parameters.
// Returns a BrowserNewPageOutput containing the newly created page
// or an error if the page cannot be created.
NewPage(ctx context.Context, in *BrowserNewPageInput) (*BrowserNewPageOutput, error)
// GetPages retrieves only pages created by this session (tracked pages).
// These are pages created with NewTab: true. Calling Close() on these pages will close them.
// Returns a BrowserGetPagesOutput with a list of session pages or an error if retrieving fails.
GetPages(ctx context.Context, in *BrowserGetPagesInput) (*BrowserGetPagesOutput, error)
// GetAllPages retrieves ALL pages in the browser, including non-session pages.
// Pages returned are NOT session-tracked, and calling Close() on them is a no-op.
// Use this for inspection/debugging. For pages created by this session, use GetPages().
GetAllPages(ctx context.Context, in *BrowserGetPagesInput) (*BrowserGetPagesOutput, error)
// Close shuts down the browser instance and cleans up any resources.
// Only closes session pages (pages created by this instance with NewTab: true).
// For external browsers, closes session pages but leaves the browser running.
Close(ctx context.Context) error
// GetDevToolClient retrieves the DevTools client associated with the browser.
// This client allows for advanced interactions with the browser's DevTools protocol,
// enabling custom actions and low-level debugging or profiling features.
GetDevToolClient() *devtool.DevTools
}
Browser defines a contract for browser operations. It allows managing browser instances and interacting with web pages.
func NewBrowser ¶
func NewBrowser(cfg *BrowserConfig, logger *slog.Logger) Browser
NewBrowser creates a new browser instance with the given configuration and logger.
type BrowserConfig ¶
type BrowserConfig struct {
// Path specifies the path to the browser executable.
Path string
// DebugPort specifies the port for debugging connections.
DebugPort string
// Args contains additional command-line arguments to pass when launching the browser.
Args []string
// Envs holds any environment variables to set for the browser process.
Envs []string
// OpenTimeout defines how long to wait for Chrome to print the "DevTools listening on" message during startup.
// If nil, a default of 5 seconds is used. Increase this if your environment starts Chrome slowly.
OpenTimeout *time.Duration
// CloseTimeout defines how long to wait for the Chrome process to terminate during shutdown.
// If nil, a default of 5 seconds is used. Increase this if your environment needs more time to exit cleanly.
CloseTimeout *time.Duration
// ConnectionURL specifies the URL of an existing Chrome/Chromium browser to connect to.
// When set, gopilot will connect to the existing browser instead of launching a new process.
// Supports both WebSocket URLs (ws://127.0.0.1:9222/devtools/browser/UUID) and HTTP (http://127.0.0.1:9222).
// The external browser will NOT be closed when Browser.Close() is called.
//
// Example:
// cfg := gopilot.NewBrowserConfig()
// cfg.ConnectionURL = "http://127.0.0.1:9222"
ConnectionURL string
}
BrowserConfig holds configuration settings for launching a browser instance.
func NewBrowserConfig ¶
func NewBrowserConfig() *BrowserConfig
NewBrowserConfig creates a new BrowserConfig with default settings. The default Path is "chromium" and the default DebugPort is "9222". It includes several default command-line arguments for browser startup.
func (*BrowserConfig) AddArgument ¶
func (c *BrowserConfig) AddArgument(arg string)
AddArgument appends an additional command-line argument to the browser configuration. This allows users to customize the launch options for the browser instance.
func (*BrowserConfig) EnableHeadless ¶
func (c *BrowserConfig) EnableHeadless()
EnableHeadless will make the browser to start as headless
type BrowserGetPagesInput ¶
type BrowserGetPagesInput struct{}
BrowserGetPagesInput represents parameters to obtain open pages.
type BrowserGetPagesOutput ¶
type BrowserGetPagesOutput struct {
Pages []Page
}
BrowserGetPagesOutput contains the list of open browser pages.
type BrowserNewPageInput ¶
BrowserNewPageInput contains parameters for creating a new page.
type BrowserNewPageOutput ¶
type BrowserNewPageOutput struct {
Page Page
}
BrowserNewPageOutput contains the result of creating a new page.
type BrowserOpenInput ¶
type BrowserOpenInput struct{}
BrowserOpenInput contains parameters required to open a browser.
type ClearLocalStorageInput ¶
type ClearLocalStorageInput struct{}
type ClearLocalStorageOutput ¶
type ClearLocalStorageOutput struct{}
type DispatchEventType ¶
type DispatchEventType string
const ( DispatchEventTypeKeyDown DispatchEventType = "keyDown" DispatchEventTypeKeyUp DispatchEventType = "keyUp" DispatchEventTypeRawKeyDown DispatchEventType = "rawKeyDown" DispatchEventTypeChar DispatchEventType = "char" )
type Element ¶
type Element interface {
ElementInput
ElementDOM
// TakeScreenshot captures a screenshot of the element.
// It uses the element's position and size to define the capture area.
// Input parameters can specify the format of the image.
// Returns the screenshot data as bytes or an error if the capture fails.
TakeScreenshot(ctx context.Context, in *ElementTakeScreenshotInput) (*ElementTakeScreenshotOutput, error)
// GetNodeID gives the current node of the element
GetNodeID(ctx context.Context) dom.NodeID
}
Element represents an interactive element in a web page.
type ElementClickInput ¶
type ElementClickInput struct {
StepDuration time.Duration // Duration for each step of the click action.
HoldDuration time.Duration // Duration to hold the mouse press before releasing.
ReturnHoldRelease bool // Return a release function to let user decide when to release mouse press
}
ElementClickInput specifies the input parameters for simulating a click on an element. - StepDuration: Duration to wait between each step of the click process: moving to the element, mouse press, and mouse release. - HoldDuration: Duration to wait between mouse press and mouse release. Defaults to StepDuration if not set.
type ElementClickOutput ¶
type ElementClickOutput struct {
X float64 `json:"x"` // X coordinate of the click position.
Y float64 `json:"y"` // Y coordinate of the click position.
Release func() error
}
ElementClickOutput represents the output of a click action. It provides the X and Y coordinates where the click occurred.
type ElementDOM ¶
type ElementDOM interface {
// HTML retrieves the element's outer HTML content
HTML(ctx context.Context) (string, error)
// Text retrieves the element's text content.
Text(ctx context.Context) (string, error)
// Focus sets focus on the element, allowing it to receive input.
// Returns an error if the action fails.
Focus(ctx context.Context) error
// ScrollIntoView performs an action to scroll the element into the viewport.
// Accepts an ElementScrollIntoViewInput with scroll parameters.
// Returns an ElementScrollIntoViewOutput or an error if the action fails.
ScrollIntoView(ctx context.Context, in *ElementScrollIntoViewInput) (*ElementScrollIntoViewOutput, error)
// GetRect retrieves the bounding rectangle of the element.
// Returns a BoundingRect containing the dimensions and position of the element, or an error if retrieval fails.
GetRect(ctx context.Context) (*BoundingRect, error)
// Remove the element from the DOM tree
Remove(ctx context.Context) error
}
type ElementInput ¶
type ElementInput interface {
// Click simulates a mouse click on the element.
// Accepts an ElementClickInput containing details for the click action.
// Returns an ElementClickOutput with the result or an error if the click fails.
Click(ctx context.Context, in *ElementClickInput) (*ElementClickOutput, error)
}
type ElementScrollIntoViewInput ¶
type ElementScrollIntoViewInput struct{}
ElementScrollIntoViewInput contains parameters for the ScrollIntoView action.
type ElementScrollIntoViewOutput ¶
type ElementScrollIntoViewOutput struct {
// X is the X coordinate of the scroll-to view position.
X float64 `json:"x"`
// Y is the Y coordinate of the scroll-to view position.
Y float64 `json:"y"`
}
ElementScrollIntoViewOutput contains the result of the ScrollIntoView action.
type ElementTakeScreenshotInput ¶
type ElementTakeScreenshotInput struct {
// Format specifies the desired image format for the screenshot.
// Common formats include "png" and "jpeg".
Format string
}
ElementTakeScreenshotInput specifies input parameters for taking a screenshot of an element.
type ElementTakeScreenshotOutput ¶
type ElementTakeScreenshotOutput struct {
// Data contains the base64 encoded screenshot image data.
Data []byte
}
ElementTakeScreenshotOutput represents the output of the TakeScreenshot method for an element.
type GetCookiesInput ¶
type GetCookiesInput struct{}
GetCookiesInput specifies the input for the GetCookies method.
type GetCookiesOutput ¶
type GetCookiesOutput struct {
Cookies []PageCookie // List of cookies.
}
GetCookiesOutput contains the cookies retrieved from the browser. It returns a list of cookies.
type GetLocalStorageInput ¶
type GetLocalStorageInput struct{}
type GetLocalStorageOutput ¶
type GetLocalStorageOutput struct {
Items []LocalStorageItem
}
type InterceptRequestCallback ¶
type InterceptRequestCallback func(ctx context.Context, req *fetch.RequestPausedReply, continueArgs *fetch.ContinueRequestArgs) (*fetch.FulfillRequestArgs, error)
InterceptRequestCallback is a function type for request interception. The callback receives details about the paused request and can modify it or provide a custom response. Return values: - (nil, nil): Continue the request with any modifications made to continueArgs - (nil, error): Abort the request with the given error - (*fetch.FulfillRequestArgs, nil): Fulfill the request with a custom response
type InterceptRequestHandle ¶
type InterceptRequestHandle struct {
// contains filtered or unexported fields
}
InterceptRequestHandle is a handle for managing request interception callbacks.
type InterceptResponseCallback ¶
type InterceptResponseCallback func(ctx context.Context, req *fetch.RequestPausedReply, continueArgs *fetch.ContinueResponseArgs) error
InterceptResponseCallback is a function type for response interception. The callback receives details about the paused response and can modify it. If an error is returned, the response processing will be interrupted.
type InterceptResponseHandle ¶
type InterceptResponseHandle struct {
// contains filtered or unexported fields
}
InterceptResponseHandle is a handle for managing response interception callbacks.
type LocalStorageItem ¶
type Page ¶
type Page interface {
PageNavigation
PageDOM
PageFetch
PageStorage
PageInput
// Close closes the page.
// Returns an error if closing the page fails.
Close(ctx context.Context) error
// Evaluate runs JavaScript on the page.
// Takes a PageEvaluateInput and returns a PageEvaluateOutput or an error.
Evaluate(ctx context.Context, in *PageEvaluateInput) (*PageEvaluateOutput, error)
// TakeScreenshot captures a screenshot of the page.
// You can choose to capture the entire page or just the visible viewport.
// Input parameters allow you to specify the image format and capture area.
// Returns the screenshot data or an error if the capture fails.
TakeScreenshot(ctx context.Context, in *PageTakeScreenshotInput) (*PageTakeScreenshotOutput, error)
// GetTargetID returns the unique identifier for the page's target.
// This ID can be used to distinguish different pages or targets in the browser.
GetTargetID() string
// GetCDPClient retrieves the Chrome DevTools Protocol (CDP) client associated with the page.
// The CDP client allows for direct communication with the browser's protocol.
// This is useful for performing low-level operations and custom actions not exposed by higher-level methods.
GetCDPClient() *cdp.Client
}
Page represents a web page in the browser.
type PageCookie ¶
type PageCookie struct {
Name string // The name of the cookie.
Value string // The value of the cookie.
Domain string // The domain the cookie is associated with.
Path string // The path the cookie is accessible from.
Size int // The size of the cookie in bytes.
Expires *time.Time // The expiration time of the cookie.
Secure bool // Indicates if the cookie is secure (only sent over HTTPS).
HttpOnly bool // Indicates if the cookie is accessible via HTTP only (not accessible via JavaScript).
Session bool // Indicates if the cookie is a session cookie.
}
PageCookie represents a cookie in the browser. It includes details such as name, value, domain, path, expiration, and security features.
type PageDOM ¶
type PageDOM interface {
// GetContent retrieves the HTML content of the page as a string.
// Returns the content or an error if retrieving fails.
GetContent(ctx context.Context) (string, error)
// SetContent replaces the current DOM with supplied content
SetContent(ctx context.Context, content string) error
// QuerySelector finds an element matching the selector.
// Takes a PageQuerySelectorInput and returns a PageQuerySelectorOutput or an error.
QuerySelector(ctx context.Context, in *PageQuerySelectorInput) (*PageQuerySelectorOutput, error)
// Search finds an element matching the text, query selector or xpath
// Takes a PageSearchInput and returns a PageSearchOutput or an error.
Search(ctx context.Context, in *PageSearchInput) (*PageSearchOutput, error)
}
type PageEvaluateInput ¶
PageEvaluateInput specifies input for the Evaluate method.
type PageEvaluateOutput ¶
type PageEvaluateOutput struct {
Value json.RawMessage
}
PageEvaluateOutput represents the output of the Evaluate method.
type PageFetch ¶
type PageFetch interface {
// EnableFetch enables network fetch interception.
// Returns an error if enabling fails.
EnableFetch(ctx context.Context) error
// DisableFetch disables network fetch interception.
// Returns an error if disabling fails.
DisableFetch(ctx context.Context) error
// AddInterceptRequest adds a request interception callback.
// Returns a handle that can be used to remove the callback later.
AddInterceptRequest(ctx context.Context, cb InterceptRequestCallback) *InterceptRequestHandle
// RemoveInterceptRequest removes a request interception callback.
// The handle parameter should be the value returned by AddInterceptRequest.
RemoveInterceptRequest(ctx context.Context, handle *InterceptRequestHandle)
// AddInterceptResponse adds a response interception callback.
// Returns a handle that can be used to remove the callback later.
AddInterceptResponse(ctx context.Context, cb InterceptResponseCallback) *InterceptResponseHandle
// RemoveInterceptResponse removes a response interception callback.
// The handle parameter should be the value returned by AddInterceptResponse.
RemoveInterceptResponse(ctx context.Context, handle *InterceptResponseHandle)
}
type PageInput ¶
type PageInput interface {
// TypeText sends a sequence of keystrokes to the element as if typed by a user.
// Accepts an ElementTypeInput containing the text to type.
// Returns an ElementTypeOutput with the result or an error if typing fails.
TypeText(ctx context.Context, in *PageTypeTextInput) (*PageTypeTextOutput, error)
}
type PageNavigateInput ¶
type PageNavigateInput struct {
}
PageNavigateInput specifies the input for the Navigate method. URL is the target URL to navigate to. WaitDomContentLoad determines whether to wait for the DOM content to load.
type PageNavigateOutput ¶
type PageNavigateOutput struct {
}
PageNavigateOutput represents the output of the Navigate method. LoaderID is the ID associated with the loading process of the page.
type PageNavigation ¶
type PageNavigation interface {
Activate(ctx context.Context) error
// The input is a PageNavigateInput containing the URL to navigate to.
// It returns a PageNavigateOutput or an error if the navigation fails.
Navigate(ctx context.Context, in *PageNavigateInput) (*PageNavigateOutput, error)
// It can take a PageReloadInput and returns a PageReloadOutput or an error.
Reload(ctx context.Context, in *PageReloadInput) (*PageReloadOutput, error)
}
type PageQuerySelectorInput ¶
type PageQuerySelectorInput struct {
Selector string
}
PageQuerySelectorInput contains the selector string for querying elements.
type PageQuerySelectorOutput ¶
type PageQuerySelectorOutput struct {
Element Element
}
PageQuerySelectorOutput contains the Element found by the query.
type PageReloadInput ¶
type PageReloadInput struct {
LoaderID network.LoaderID // The LoaderID of the previous load.
WaitDomContentLoad bool // If true, waits for the DOM content to load after reload.
}
PageReloadInput specifies the input for the Reload method. LoaderID is the ID associated with the previous loading process. WaitDomContentLoad determines whether to wait for the DOM content to load after reloading.
type PageReloadOutput ¶
type PageReloadOutput struct{}
PageReloadOutput represents the output of the Reload method.
type PageSearchInput ¶
type PageSearchInput struct {
Selector string // selector for querying the element.
Pierce bool // Include User Agent Shadow DOM if true.
WaitDuration time.Duration // Max duration to wait for an element to be present.
TickDuration time.Duration // Duration between search attempts; defaults to 1 second if unset.
}
PageSearchInput contains the selector string for querying elements.
type PageSearchOutput ¶
type PageSearchOutput struct {
Element Element // The first element matching the query.
}
PageSearchOutput contains the Element found by the query.
type PageStorage ¶
type PageStorage interface {
// GetCookies retrieves cookies for the current page.
// Takes a GetCookiesInput and returns GetCookiesOutput or an error.
GetCookies(ctx context.Context, in *GetCookiesInput) (*GetCookiesOutput, error)
// SetCookies sets cookies for the current page.
// Takes a SetCookiesInput and returns SetCookiesOutput or an error.
SetCookies(ctx context.Context, in *SetCookiesInput) (*SetCookiesOutput, error)
// ClearCookies clears cookies for the current page.
ClearCookies(ctx context.Context) error
GetLocalStorage(ctx context.Context, in *GetLocalStorageInput) (*GetLocalStorageOutput, error)
SetLocalStorage(ctx context.Context, in *SetLocalStorageInput) (*SetLocalStorageOutput, error)
ClearLocalStorage(ctx context.Context) error
}
type PageTakeScreenshotInput ¶
type PageTakeScreenshotInput struct {
// Format specifies the desired image format for the screenshot.
// Options could be "png" or "jpeg".
Format string
// Full determines whether to capture the entire page or only the current viewport.
Full bool
// Viewport allows specifying a custom area of the page to capture.
Viewport *cdppage.Viewport
}
PageTakeScreenshotInput specifies input parameters for taking a screenshot of a page.
type PageTakeScreenshotOutput ¶
type PageTakeScreenshotOutput struct {
// Data contains the screenshot image data.
Data []byte
}
PageTakeScreenshotOutput represents the output of the TakeScreenshot method for a page.
type PageTypeTextInput ¶
type PageTypeTextInput struct {
Text string // The text to be typed into the page.
Delay time.Duration // (optional) Duration between keystrokes.
DelayFunc TypeDelayFunc // (optional) Custom function for typing delays.
UseRawKeyDown bool
}
PageTypeTextInput specifies input for the Type method. Text specifies the string to type into the page. Delay is the duration between keystrokes. DelayFunc is function to control typing delays.
type PageTypeTextOutput ¶
type PageTypeTextOutput struct{}
PageTypeTextOutput represents the output of the Type method. It is currently empty, but can be extended to provide additional details of the typing operation.
type SetCookiesInput ¶
type SetCookiesInput struct {
Cookies []PageCookie // List of cookies to set.
}
SetCookiesInput specifies the input for the SetCookies method. It contains a list of cookies to set in the browser.
type SetCookiesOutput ¶
type SetCookiesOutput struct{}
SetCookiesOutput is returned after setting cookies successfully.
type SetLocalStorageInput ¶
type SetLocalStorageInput struct {
Items []LocalStorageItem
}
type SetLocalStorageOutput ¶
type SetLocalStorageOutput struct{}
type TypeDelayFunc ¶
type XHREvent ¶
type XHREvent struct {
URL string `json:"url"` // The URL that was requested
Body string `json:"body"` // The body of the response
Base64 bool `json:"base64"` // Indicates if the response body is Base64 encoded
Error error `json:"-"` // Error encountered during the request (if any)
}
XHREvent represents an XHR event with related information.
type XHRMonitor ¶
type XHRMonitor interface {
// Listen starts listening for XHR events that match the provided patterns.
// It returns a channel of XHREvent and an error if the operation fails.
Listen(ctx context.Context, patterns []string) (chan *XHREvent, error)
// Stop stops monitoring the XHR requests.
// Returns an error if stopping fails.
Stop(ctx context.Context) error
}
XHRMonitor is an interface for monitoring XHR requests.
func NewXHRMonitor ¶
func NewXHRMonitor(p Page) XHRMonitor
NewXHRMonitor creates a new XHRMonitor instance. It takes a Page and returns an instance of XHRMonitor.
Source Files
¶
Directories
¶
| Path | Synopsis |
|---|---|
|
examples
|
|
|
click_element
command
|
|
|
cookies
command
|
|
|
eval
command
|
|
|
external_browser
command
|
|
|
listen_xhr
command
|
|
|
local_storage
command
|
|
|
open_chrome
command
|
|
|
open_url
command
|
|
|
request_modifier
command
|
|
|
screenshots
command
|
|
|
search
command
|
|
|
typing
command
|