|
| 1 | +### About CloudFlare |
| 2 | +#### What is Cloudflare Bot Management |
| 3 | +Cloudflare is a web performance and security company. On the security side, they offer customers a [Web Application Firewall (WAF)](https://www.cloudflare.com/waf/). A WAF can defend applications against several security threats, such as cross-site scripting (XSS), credential stuffing, and DDoS attacks. |
| 4 | + |
| 5 | +One of the core systems included in their WAF is Cloudflare's Bot Manager. As a bot protection solution, its main goal is to mitigate attacks from malicious bots without impacting real users. |
| 6 | + |
| 7 | +About 1/5 of websites you need to scrape use Cloudflare, a hardcore anti-bot protection system that gets you blocked easily. |
| 8 | + |
| 9 | +#### How to bypass it? |
| 10 | +Refer to [this question on stackoverflow](https://stackoverflow.com/questions/71529199/where-does-cloudflare-detect-web-and-terminal-requests-on-equal-terms), |
| 11 | +Cloudflare uses various techniques to determine whether the user agent is a real browser or not. And, the site owner can also determine the level of risk they can allow via the Cloudflare platform. |
| 12 | +Let's discuss a few techniques (I know) used by Cloudflare: |
| 13 | +* TLS fingerprinting This is one of the prominent techniques used notoriously by Cloudflare. This is also the reason why tools like native proxy are popular. Link: https://github.com/klzgrad/naiveproxy |
| 14 | +* Cookies Cloudflare used to have some cf_ related cookies which are used to distinguish real users or not. |
| 15 | + |
| 16 | +And, these are only a few techniques. Cloudflare has many more. |
| 17 | + |
| 18 | +After many tests, the proxy based solutions (naive, FlairSolverr) don't really work! The [curl-impersonate](https://github.com/lwthiker/curl-impersonate) based solution works well. This includes these steps: |
| 19 | +* install curl-impersonate |
| 20 | +* compile node-libcurl with curl-impersonate |
| 21 | +* [test](curl/README.md) |
| 22 | +* rewrite leetcode-cli to replace request with modified node-libcurl |
| 23 | +* update the vscode leetcode plugin(extension) to use the enhanced leetcode-cli plugin |
| 24 | + |
| 25 | +Finally, we use the curl_chrome116 command line + exec as the solution, because the NODE_MODULE_VERSION incompatibility issue in vscode. The vscode itself is built by electron, but we build the modified version of node-libcurl with node (18.12.0). Although in vscode leetcode extension, it spawns a separate node process (18.12.0) to run the underlying leetcode commands, we still got the NODE_MODULE_VERSION error. |
| 26 | + |
| 27 | +``` |
| 28 | +const childProc = wsl.useWsl() |
| 29 | + ? cp.spawn("wsl", [leetCodeExecutor_1.leetCodeExecutor.node, leetCodeBinaryPath, "user", commandArg], { shell: true }) |
| 30 | + : cp.spawn(leetCodeExecutor_1.leetCodeExecutor.node, [leetCodeBinaryPath, "user", commandArg], { |
| 31 | + shell: true, |
| 32 | + env: cpUtils_1.createEnvOption(), |
| 33 | + }); |
| 34 | +
|
| 35 | +this.executeCommandEx(this.nodeExecutable, [yield this.getLeetCodeBinaryPath(), "plugin", "-e", plugin]); |
| 36 | +``` |
| 37 | + |
| 38 | +### Install curl-impersonate |
| 39 | +Refer to [INSTALL.md](https://github.com/lwthiker/curl-impersonate/blob/main/INSTALL.md#macos) |
| 40 | ++ install prebuild binary through brew |
| 41 | +``` |
| 42 | +brew tap shakacode/brew |
| 43 | +brew install curl-impersonate |
| 44 | +``` |
| 45 | ++ or compile & install from source code |
| 46 | +``` |
| 47 | +# Install dependencies for building all the components: |
| 48 | +brew install pkg-config make cmake ninja autoconf automake libtool |
| 49 | +# For the Firefox version only |
| 50 | +brew install sqlite nss |
| 51 | +pip3 install gyp-next |
| 52 | +# For the Chrome version only |
| 53 | +brew install go |
| 54 | +
|
| 55 | +# Clone the repository |
| 56 | +git clone https://github.com/lwthiker/curl-impersonate.git |
| 57 | +cd curl-impersonate |
| 58 | +
|
| 59 | +# Configure and compile |
| 60 | +mkdir build && cd build |
| 61 | +../configure |
| 62 | +# Build and install the Firefox version |
| 63 | +gmake firefox-build |
| 64 | +sudo gmake firefox-install |
| 65 | +# Build and install the Chrome version |
| 66 | +gmake chrome-build |
| 67 | +sudo gmake chrome-install |
| 68 | +# Optionally remove all the build files |
| 69 | +cd ../ && rm -Rf build |
| 70 | +``` |
| 71 | + |
| 72 | +### Compile node-libcurl with curl-impersonate |
| 73 | +Build node-libcurl from source on macOS |
| 74 | +``` |
| 75 | +
|
| 76 | +# install the build tool node-gyp |
| 77 | +npm i -g node-pre-gyp node-gyp |
| 78 | +# build & install node-libcurl from source, first time (to generate build files) |
| 79 | +# npm_config_build_from_source=true npm i node-libcurl |
| 80 | +# use yarn as npm doesn't create build folders and make files! |
| 81 | +npm_config_build_from_source=true yarn add node-libcurl |
| 82 | +npm_config_build_from_source=true npm_config_curl_static_build=false yarn add node-libcurl |
| 83 | +# static build runs successfully, but got missing symbol(dyld) at runtime (TODO) |
| 84 | +npm_config_build_from_source=true npm_config_curl_static_build=true yarn add node-libcurl |
| 85 | +
|
| 86 | +# got below error: |
| 87 | +# npm ERR! clang: error: no such file or directory: '/usr/include' |
| 88 | +# modify below make file and remove all /usr/include, save it |
| 89 | +vi ./node_modules/node-libcurl/build/node_libcurl.target.mk |
| 90 | +
|
| 91 | +# for static build, got below errors: |
| 92 | +# clang: error: no such file or directory: '/usr/lib/libcurl.@libext@' |
| 93 | +# clang: error: no such file or directory: '@LDFLAGS@' |
| 94 | +# clang: error: no such file or directory: '@LIBCURL_LIBS@' |
| 95 | +
|
| 96 | +# or we can do following tricks to "modify" curl-config |
| 97 | +# because build/config.gypi & build/node_libcurl.target.mk are generated based on curl-config |
| 98 | +cp /usr/bin/curl-config /usr/local/bin/curl-config |
| 99 | +vi /usr/local/bin/curl-config |
| 100 | +# modify & save, make sure /usr/local/bin is before /usr/bin in PATH env var |
| 101 | +# reload the shell: source ~/.zshrc |
| 102 | +
|
| 103 | +# then build it again with node-gyp |
| 104 | +cd ./node_modules/node-libcurl |
| 105 | +node-gyp build |
| 106 | +
|
| 107 | +# verify the lib/binding/node_libcurl.node file |
| 108 | +otool -L lib/binding/node_libcurl.node |
| 109 | +
|
| 110 | +lib/binding/node_libcurl.node: |
| 111 | + /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0) |
| 112 | + /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 1600.157.0) |
| 113 | + /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1336.61.1) |
| 114 | +``` |
| 115 | + |
| 116 | + |
| 117 | +``` |
| 118 | +# also noticed the reported version is changed |
| 119 | +# console.log(Curl.getVersionInfo()); |
| 120 | +node curl/lc.js |
| 121 | +``` |
| 122 | + |
| 123 | + |
| 124 | +``` |
| 125 | +# run test, it works! |
| 126 | +node curl/lc.js |
| 127 | +
|
| 128 | +### questions ??? |
| 129 | +# otool -L node_modules/node-libcurl/lib/binding/node_libcurl.node |
| 130 | +to find out which libcurl.4.dylib is loaded (as /usr/lib/libcurl.4.dylib doesn't exist!) ? |
| 131 | +otool |
| 132 | +lib/binding/node_libcurl.node: |
| 133 | + /usr/lib/libcurl.4.dylib (compatibility version 7.0.0, current version 9.0.0) |
| 134 | +
|
| 135 | +# DYLD_PRINT_LIBRARIES=1 node curl/lc.js |
| 136 | +... ... |
| 137 | +dyld[69574]: <4528259C-8493-3A0C-8B35-F29E87F59EED> /Users/harry/test/leetcode-cli/node_modules/node-libcurl/lib/binding/node_libcurl.node |
| 138 | +dyld[69574]: <90815EBD-89C8-33E7-8B86-5A024176BC15> /usr/lib/libcurl.4.dylib |
| 139 | +... ... |
| 140 | +looks like macOS has some special mapping when loading /usr/lib/libcurl.4.dylib (in memory or cache?) |
| 141 | +
|
| 142 | +# references |
| 143 | +# https://github.com/lwthiker/curl-impersonate |
| 144 | +# https://github.com/lwthiker/curl-impersonate#libcurl-impersonate |
| 145 | +# https://github.com/lwthiker/curl-impersonate/issues/80#issuecomment-1166192854 |
| 146 | +# https://github.com/JCMais/node-libcurl?tab=readme-ov-file#building-on-macos |
| 147 | +``` |
| 148 | + |
| 149 | +### Test Conclusion |
| 150 | +| Not working | Working | |
| 151 | +| ----------- | ----------- | |
| 152 | +| original node-libcurl | curl-impersonate | |
| 153 | +| naive proxy | node exec + curl-impersonate | |
| 154 | +| | modified node-libcurl | |
| 155 | + |
| 156 | +### TODO |
| 157 | +- Fix the NODE_MODULE_VERSION error by building node-libcurl [with electron](https://github.com/JCMais/node-libcurl?tab=readme-ov-file#electron-aka-atom-shell) |
| 158 | +``` |
| 159 | +Failed to list problems: Error: The module '/Users/harry/.vscode/extensions/leetcode.vscode-leetcode-0.18.1/node_modules/vsc-leetcode-cli/node_modules/node-libcurl/lib/binding/node_libcurl.node' was compiled against a different Node.js version using NODE_MODULE_VERSION 108. This version of Node.js requires NODE_MODULE_VERSION 118. Please try re-compiling or re-installing the module (for instance, using `npm rebuild` or `npm install`).. |
| 160 | +``` |
| 161 | +- Try different install locations for the node-libcurl |
| 162 | + |
| 163 | +### Update the vscode leetcode extension |
| 164 | + |
| 165 | +### References |
| 166 | +- [Could not login with both 'leetcode user -l' and 'leetcode user -c'](https://github.com/skygragon/leetcode-cli/issues/218) |
| 167 | +- [Cannot login with premium account](https://github.com/skygragon/leetcode-cli/issues/194) |
| 168 | +- [Failed to log in with a leetcode.com account](https://github.com/LeetCode-OpenSource/vscode-leetcode/issues/478), [a comment](https://github.com/LeetCode-OpenSource/vscode-leetcode/issues/478#issuecomment-564757098) |
| 169 | +- Proxy Server to bypass Cloudflare: [FlareSolverr](https://github.com/FlareSolverr/FlareSolverr), [naiveproxy](https://github.com/klzgrad/naiveproxy) |
| 170 | +- [How To Bypass Cloudflare in 2024](https://scrapeops.io/web-scraping-playbook/how-to-bypass-cloudflare/) |
| 171 | +- [How to Bypass Cloudflare in 2024: The 8 Best Methods](https://www.zenrows.com/blog/bypass-cloudflare) |
| 172 | +- [How to bypass Cloudflare when web scraping in 2024](https://scrapfly.io/blog/how-to-bypass-cloudflare-anti-scraping/) |
| 173 | +- [node abi versions](https://github.com/nodejs/node/blob/main/doc/abi_version_registry.json) |
| 174 | + |
| 175 | +### Archived notes |
| 176 | +``` |
| 177 | +# node-gyp related files: |
| 178 | +~/Library/Caches/node-gyp/20.11.1/include/node/common.gypi |
| 179 | +~/Library/Caches/node-gyp/20.11.1/include/node/config.gypi |
| 180 | +./node_modules/node-gyp/addon.gypi |
| 181 | +./node_modules/node-libcurl/build/config.gypi |
| 182 | +./node_modules/node-libcurl/build/node_libcurl.target.mk |
| 183 | +
|
| 184 | +# other stuff (useless) |
| 185 | +# no build running |
| 186 | +npm install node-libcurl --verbose --build-from-source --curl_static_build=false --update-binary |
| 187 | +
|
| 188 | +# rebuild the node_libcurl.node binding |
| 189 | +npm rebuild node-libcurl --update-binary |
| 190 | +
|
| 191 | +export @LDFLAGS@="-L/usr/local/lib -L$(xcrun --show-sdk-path)/usr/lib -L/usr/lib" |
| 192 | +export @LIBCURL_LIBS@="-L/usr/local/opt/curl/lib" |
| 193 | +
|
| 194 | +export CFLAGS="-I/usr/local/include" |
| 195 | +export CXXFLAGS="-I/usr/local/include" |
| 196 | +export CPPFLAGS="-I/usr/local/include" |
| 197 | +export LDFLAGS="-L/usr/local/lib -L/usr/local/Cellar/curl/0.6.0-alpha.1/lib" |
| 198 | +export LIBRARY_PATH="/usr/local/lib -L/usr/local/Cellar/curl/0.6.0-alpha.1/lib" |
| 199 | +$(xcrun --show-sdk-path)/usr/include |
| 200 | +# Set environment variables for include and lib directories |
| 201 | +export CURL_INCLUDE_DIR=/usr/local/Cellar/curl/8.6.0/include/curl |
| 202 | +export CURL_LIB_DIR=/usr/local/Cellar/curl-impersonate/0.6.0-alpha.1/lib |
| 203 | +
|
| 204 | +# use the default macOS clang compiler! do NOT use other compilers |
| 205 | +CC=gcc-13 CXX=g++-13 npm_config_build_from_source=true yarn add node-libcurl |
| 206 | +CC=llvm-gcc CXX=llvm-g++ npm_config_build_from_source=true yarn add node-libcurl |
| 207 | +
|
| 208 | +npm install node-libcurl --build-from-source --curl_libraries='-Wl,-rpath /usr/local/lib -lcurl' |
| 209 | +npm install node-libcurl --build-from-source --curl_libraries='-Wl,-rpath /usr/local/lib -lcurl-impersonate-chrome' |
| 210 | +
|
| 211 | +leetcode-cli locations: |
| 212 | +- ~/.nvm/versions/node/v18.12.0/lib/node_modules/leetcode-cli-plugins |
| 213 | +- ~/.nvm/versions/node/v18.12.0/lib/node_modules/vsc-leetcode-cli |
| 214 | +- ~/.nvm/versions/node/v18.12.0/bin/leetcode |
| 215 | +- ~/.vscode/extensions/leetcode.vscode-leetcode-0.18.1/ |
| 216 | +- ~/.vscode/extensions/leetcode.vscode-leetcode-0.18.1/node_modules/vsc-leetcode-cli/ |
| 217 | +``` |
0 commit comments