“As long as you painstakingly follow all the steps,Senku, Dr. Stone
it is 10 billion percent possible.
That is science.”
In the last weekend, I finally finished the massive refactoring of my 2yr-old python webscraping project. It is now online in a repository, and the program has been divided up into proper modules. I now have better focus on improving each part. The downloader is now multithreaded, has improved sorting, and more sites are supported.
After that, I wanted to build something lighter and more accessible. There’s nothing more dubious than distributing an .exe online with some hard dependencies that need to be explained and installed. My first thought was to have an actual webpage. Inspired by https://qsniyg.github.io/maxurl/, a tool that takes image url’s and finds better resolutions. A *small* script on a page would do right? Or so I thought.
So off I went to GitHub to setup a new repository and project page. Got that working, added simple HTML for a textbox. The first problem I encountered was about access to Twitter’s API. That would be the easiest route. But it turns out I just can’t have my API keys hardcoded in a public and opensource script. It would need to be old school text scraping to get the data.
An intuitive way to serve the tool was somewhere familiar – I want a new button above Copy link to Tweet. The first step: Find the dropdown and append the button. There were a few more intricacies to it like matching the element’s css to make it fit in, adding the new onclick event, and actually closing the dropdown after use. Those were solved with good old StackOverflow posts and YouTube tutorials.
Now I have a working event and a reference to the parent Tweet. It’s time for scraping the data for the Discord post. The easy ones first are the username, and permalink. For taking the date string, find the tweet’s text body and pass it through regular expression searches. The image content links are already embedded in the tweet so a little more digging to get them into an array. Parse the image links to point to the original resolutions. Put them all together into a formatted Discord post and pass that into the clipboard with some hacks. Basically that’s done. But… Twitter has a dynamic stream of content for the timeline page. New tweets get loaded in after the initial page load and are not affected by the userscript at this point. That needs to be solved.
Part 2 soon with more polish and the New Twitter problem. For anyone interested, the project is currently on GitHub – https://github.com/roguesleipnir/reaper.lite