Documentation
¶
Index ¶
Constants ¶
const ( // Baidu is the largest search engine in China, providing various services beyond search such as maps, news, images, etc. Baidu = "baidu" // Bing is a web search engine owned and handled by Microsoft. It provides web search services to users and has a variety of features such as search, video, images, maps, and more. Bing = "bing" // Google is globally recognized and is the most used search engine. It handles over three billion searches each day and offers services beyond search like Gmail, Google Docs, etc. Google = "google" // SoGou is another Chinese search engine, owned by Sohu, Inc. It's the default search engine of Tencent's QQ soso.com, sogou.com, and Firefox in China. SoGou = "sogou" )
Declaration of constants that represent four different search engines.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type BaiduStrategy ¶ added in v0.1.19
type BaiduStrategy struct {
*UniversalStrategy
}
BaiduStrategy is a struct used for detecting web crawlers. It embeds a pointer to an instance of the UniversalStrategy from the crawlerdetect package. By embedding it, the BaiduStrategy struct can directly call the methods and use the properties of the `UniversalStrategy`, achieving behavior akin to inheritance.
func InitBaiduStrategy ¶ added in v0.1.19
func InitBaiduStrategy() *BaiduStrategy
InitBaiduStrategy is a function that creates and initializes an instance of the BaiduStrategy struct. It sets up the embedded UniversalStrategy with a pre-defined list of known crawler hostnames of Baidu, a popular search engine in China and Japan.
Returns: - *BaiduStrategy: A pointer to an instance of the BaiduStrategy struct. Thanks to the predefined list of hosts in the `UniversalStrategy`, this `BaiduStrategy` instance is ready to detect crawlers from Baidu.
type BingStrategy ¶ added in v0.1.19
type BingStrategy struct {
*UniversalStrategy
}
BingStrategy is a struct which embeds a pointer to an instance of the UniversalStrategy from the crawlerdetect package. This UniversalStrategy is designed to provide general mechanism for detection of web crawlers.
Embedding the UniversalStrategy directly inside the BingStrategy struct allows it to inherit the methods and attributes of the UniversalStrategy, thereby enabling BingStrategy to act as a specialized version of the UniversalStrategy.
func InitBingStrategy ¶ added in v0.1.19
func InitBingStrategy() *BingStrategy
InitBingStrategy function is responsible for the creation and initialization of a BingStrategy instance.
Specifically, it creates a new BingStrategy and inside it, it initializes the embedded UniversalStrategy with a list of known hosts associated with a specific web crawler. In this case, the host "search.msn.com" is known to be associated with a web crawler from Microsoft's search engine, Bing.
Returns:
- *BingStrategy: A pointer to an instance of the BingStrategy struct, with the embedded UniversalStrategy initialized for detecting Bing's web crawler.
type GoogleStrategy ¶ added in v0.1.19
type GoogleStrategy struct {
*UniversalStrategy
}
GoogleStrategy is a struct that embeds a pointer to a UniversalStrategy from the crawler detect package. This is a pattern often used in Go to achieve composition, where GoogleStrategy 'is-a' UniversalStrategy, gaining access to its methods and properties directly. The purpose of embedding this specific UniversalStrategy is to leverage predefined methods and capabilities for detecting web crawlers based on a list of hosts.
func InitGoogleStrategy ¶ added in v0.1.19
func InitGoogleStrategy() *GoogleStrategy
InitGoogleStrategy is a function that initializes and returns a pointer to a GoogleStrategy instance. It specifically initializes the embedded UniversalStrategy field with a set of hosts that are known to be associated with Google's web crawlers. This setup is useful for systems looking to detect and possibly differentiate traffic originating from Google's crawlers.
Returns:
- *GoogleStrategy: A pointer to the newly created GoogleStrategy instance. This instance now contains a UniversalStrategy initialized with a predefined list of hosts known to be used by Google's crawlers.
Usage Notes:
- The list of hosts ('googlebot.com', 'google.com', 'googleusercontent.com') are specifically chosen because they are commonly associated with Google's web crawling services. The intention is to recognize traffic from these entities during web crawling detection checks.
- This setup is particularly useful for SEO-sensitive websites or web applications that might want to tailor their responses based on whether the traffic is generated by human users or automated crawlers.
type SogouStrategy ¶ added in v0.1.19
type SogouStrategy struct {
Hosts []string
}
SogouStrategy is a struct that holds information needed to check if an IP address is associated with a known web crawler. This is often used to differentiate between regular user traffic and automated crawlers, such as those used by search engines.
Fields:
- Hosts: A slice of strings where each string is a host that is known to be associated with a web crawler. For instance, "googlebot.com" for Google's crawler.
func InitSogouStrategy ¶ added in v0.1.19
func InitSogouStrategy() *SogouStrategy
InitSogouStrategy is a package-level function that initializes a SogouStrategy struct with predefined host names of known crawlers. This example uses "sogou.com" as a known crawler host.
Returns: - *SogouStrategy: A pointer to a SogouStrategy instance with prepopulated Hosts field.
func (*SogouStrategy) CheckCrawler ¶ added in v0.1.19
func (s *SogouStrategy) CheckCrawler(ip string) (bool, error)
CheckCrawler is a method linked to the SogouStrategy struct that attempts to verify if a given IP address belongs to a known web crawler defined in the struct's Hosts field.
Parameters: - ip: The IP address to check against the list of known crawler hosts.
Returns: - bool: Indicates whether the IP is a known crawler (`true`) or not (`false`). - error: Any error encountered during the DNS look-up process.
The method performs a reverse DNS lookup of the IP address to ascertain if any associated hosts match the ones listed in the SogouStrategy's Hosts field using the matchHost method.
type Strategy ¶
type Strategy interface {
// CheckCrawler is a method that takes an IP address as input and returns a boolean
// and an error. The boolean indicates whether the given IP address belongs to a crawler
// or bot, and the error provides details if something went wrong during the check.
// The implementation of this method should contain the logic for determining crawler activity.
CheckCrawler(ip string) (bool, error)
}
BaiduStrategy is an interface defining the methods that all crawler check strategies must implement. Different search engines may have their own implementation of the BaiduStrategy interface to accommodate their specific methods for detecting crawlers.
func InitCrawlerDetector ¶
InitCrawlerDetector is a function that retrieves a BaiduStrategy instance from a pre-defined map of strategies called strategyMap, based on the specified crawler string. Each crawler in the map is associated with a specific initialization function for its strategy, which is assumed to have been initialized earlier and stored in the strategyMap. This function acts as a lookup to fetch the appropriate BaiduStrategy instance for a given crawler.
Parameters:
- crawler: A string that identifies the crawler whose BaiduStrategy instance needs to be retrieved. It acts as a key to the strategyMap.
Returns:
- BaiduStrategy: A BaiduStrategy instance associated with the provided crawler string. If the crawler string does not exist in the map, the function returns a nil value.
Usage Notes:
- The strategyMap is a global variable where the key is a string representing the crawler's name, and the value is an instance of a BaiduStrategy implementation specific to that crawler.
- The provided crawler string should match one of the keys in the strategyMap for the function to return a valid BaiduStrategy instance.
- If the crawler string is not found in the strategyMap or is misspelled, the function will return a nil value, which the caller must check for before proceeding to use the returned BaiduStrategy instance.
Example:
- To retrieve the BaiduStrategy associated with 'Google', you would call: googleStrategy := InitCrawlerDetector("Google")
type UniversalStrategy ¶
type UniversalStrategy struct {
Hosts []string // Hosts is a slice of strings that contains the hostnames or IP addresses used to identify crawlers.
}
UniversalStrategy is a structure that holds information relevant to a generic approach for checking crawlers across different search engines.
func InitUniversalStrategy ¶
func InitUniversalStrategy(hosts []string) *UniversalStrategy
InitUniversalStrategy is a function that initializes a UniversalStrategy instance. It takes a slice of hosts as input, which represent the hostnames or IP addresses used to identify crawlers. The function returns a pointer to a new UniversalStrategy instance that embeds this input data.
Parameters:
- Hosts: This is a slice of strings that hold hostnames/IP addresses for crawler identification.
The function constructs a UniversalStrategy struct and sets its internal "Hosts" field to the input slice of hostnames/IP addresses.
func (*UniversalStrategy) CheckCrawler ¶
func (s *UniversalStrategy) CheckCrawler(ip string) (bool, error)
CheckCrawler is a method associated with the UniversalStrategy struct, intended to determine if a given IP address belongs to a known crawler, typically employed by search engines. It operates by performing a reverse lookup of the IP to obtain hostnames and then matching these against the UniversalStrategy's list of hosts that are known to be crawlers.
Parameters:
- IP: A string representing the IP address to be checked.
Returns:
- bool: True if the IP address is identified as a crawler, false otherwise.
- error: Any error encountered during the execution of the IP lookup or subsequent operations.
The method performs the following steps:
- It executes a reverse DNS lookup on the given IP address to retrieve associated hostnames.
- If an error occurs during the lookup, it returns false along with the error.
- If no hostnames are found, it means the IP cannot be linked to any crawler and returns false.
- If hostnames are found, it attempts to match them with known crawler hosts in the UniversalStrategy's list. This is done through a custom matchHost method that is not shown here.
- If there's no match, it returns false.
- If a match is found, it then performs a forward IP lookup for the matched hostname to verify the IP address.
- If the forward lookup yields an error, it returns false and the error.
- Finally, it checks if the list of IPs from the forward lookup of the hostname contains the original IP address. If so, it confirms the IP address belongs to a known crawler and returns true; otherwise, it returns false.