This is not quite complete; I’ll update the last item or two later.
I finally got my entire tweet catalog organized, hash tagged for topics, tweet casted events & speakers earlier this year. I also established a Fave field for what I consider good tweets and a Classic field for the great ones. Other fields include date & time (obviously), a checked/unchecked field I use for general organizational stuff, a field that starts with the text of the tweet followed by the speaker’s name if it’s not me, and a field that counts the number of characters in the tweet. This will be the first time I’ve added to the organized catalog, so I’m recording the steps so I won’t have to figure it out again next year.
Request Twitter Archive
The first step is to get my Twitter archive. In a web browser on my laptop, there’s a More button at the bottom of the left sidebar. Click that, then on the pop-up window click Settings & Privacy. On the next screen, click Your Twitter Data under Data & Permissions. Under Download your Twitter data, there’s one button for Twitter and another for Periscope. Click Twitter. It’ll ask for my password. It takes awhile to put it all together, but in the next hour or two I’ll get a notification saying it’s ready. Now, Twitter lets people download their data only once per month, so it’s important to pick the time carefully to avoid copying a bunch of new tweets into the catalog by hand.
Convert Archive
Until this year, the archive included a CSV of all tweets back to the beginning. This year, they added a bunch of new stuff, including all of the media (pics & movies) you’ve ever posted, which is nice, but they inexplicably left out the Index.html file where you could see all the tweets just as they appeared and the CSV file which you could easily open in any spreadsheet. Now, everything except the media is in JSON files, which only a tiny fraction of people know how to manipulate. It took me several days to figure out how to convert it so I can open it in Mac Numbers.
Searching for a solution, it’s obvious many other people have had the same problem, and none of the ones I found had discovered a solution that non-programmers could easily do. So I tried to find an app, or find an online converter, which you would think would be easy. But every one I tried had some sort of problem. They either said “conversion failed,” or they tried to extort me for hundreds of dollars per year to get what they said was a successfully converted file. Obviously, I’m not going to do that.
Finally, I noticed that several of the failed conversions kept saying there was a problem with the first line of the file. Fortunately, I needed to download my archive a few months ago, and was able to convert it successfully on one of the online converters. It turns out that Firefox can open Javascript files as text, so I opened both of them. Sure enough, the first line of the new file that didn’t work was “window.YTD.tweet.part0 = [ {“ while the first line of the old file that did work was just “[ {“. So I deleted the offending “window.YTD.tweet.part0 = “. And I should note that while YTD implies that it’s only Year To Date, the file actually contains all my tweets back to the beginning like it always has.
The converter I originally used, and which I was really happy with, is the very simple & totally free https://onlinecsvtools.com/convert-json-to-csv by browserling. Click Import From File on the JSON side and it’ll load the file in a few seconds (upload time depends on file size, of course). Click Use Headers in the box just below the JSON box so it’ll convert the first line to a header row. Very quickly, it converts the file and the data pops up in the SCV window. If it doesn’t pop up, there’s something wrong with the file. Click Save As… to get the file. Unfortunately, it only read a few of the columns. While they were the most important ones, there was a lot of useful stuff missing.
So I went back and looked at a conversion I’d done in early November, and it was on a site convertcsv.com. So I went back to that site to convert the data. Simply put, it was one of the most painful things I’ve ever done. Nearly all day New Years Eve was spent trying to shepherd it through. For the first several times, the browser kicked cranked on it for several minutes, then reset itself due to lack of memory. And to be fair, xfinity was incredibly slow that day, so that may have had something to do with it. I ended up doing it in Firefox, since I had a bunch of windows open in Safari and didn’t want to close them. Before running it, I restarted my Mac just to be sure it was as clean as possible. Still, it repeatedly ran low on memory, so I’d follow it in the CleanMyMac widget in the toolbar, and free up memory every time it got below 10 gigs. It was absolutely painstaking because I couldn’t do anything else other than sit there and play games on my phone and watch it out of the corner of my eye. But fiiiiiiiinally, it finished and I was able to download the converted data.
Delete Columns
When I opened the file in Numbers, I got an error message saying that Numbers only reads the first 256 columns, and there were over 500. Assuming that Twitter probably put the most important stuff near the beginning, I said OK and opened it. Turns out that was correct. In the spreadsheet structure, they put the first few of an item type near the beginning, then I’d find more buried deeper. Towards the end, data was getting really sparse so I doubt there was anything really essential in the columns that didn’t load. And I found many columns that contain stuff I know I don’t need, or that duplicated other info, such as the full URL of a link (which I kept) versus various other iterations (like the Twitter-shortened URL) of exactly the same thing.
Without getting into the rationale of why I deleted columns, here are the ones I kept: favorite_count, retweet_count, created_at, full_text, entities/urls/0/expanded_url, entities/media/0/expanded_url, people mentioned & their usernames 11 deep (I’ll probably only need the first one, if even that), and hashtags 6 deep.
Fix Date
- There are several extraneous things in the date & time, so I create a new column where I’ll create a text formula to keep only what I need. Currently, the date & time look like this: Mon Jan 26 17:31:04 +0000 2009. I don’t need the day of the week or the +0000, but I do need the year with the month & date so Numbers will read it correctly. The formula is: MID(E2,5,3)&“ ”&MID(E2,9,2)&“ ”&RIGHT(E2,4)&“ ”&MID(E2,12,8). This converts the date & time above into: Jan 26 2009 17:31:04. I paste this into every cell of the column.
- So the dates look right on the screen, but they’re still just formulas, so I create another column named date & time beside the formula column. I copy the formula column and Paste Formula Results into the date & time column, so Numbers sees it as actual dates & times, not formulas.
- Change the date & time format to look like this: 2009-01-26 17:31:04, and so that it left-justifies.
- Sort by date & time.
- Delete the created_at column & the date working column.
Copy Orphan Tweets
I do a holiday mailing each year that contains some of my best work on Twitter for the year, so I have to harvest the archive before the year is over. This means there are always some orphan tweets that didn’t make it into the catalog already. So I find those and copy the date & time and the text of the tweet.
Delete Previous Years
I make a Filter for all dates before 1/1/19 (or whatever the current year is, then delete all rows remaining, as they are in previous years.
Total Faves & RTs & Measure Length
I added a column after the fave & RT columns totaling the two, for a sense of the impact of the tweet, in the interest of assigning fave & classic settings in the final document. I also added a length column before the text of the tweets measuring total characters.
Delete Short Tweets
I sort the Length column, and delete most of the short tweets which tend to be stuff like “Thanks!” For this year, the shortest tweet I found that was useful standing alone was 45 characters.
Delete Original Green Daily
I do a paper.li newsletter every day. When paper.li puts it out, it tweets The Original Green Daily is Out! with a link. So I sort by text, which puts them all together so I can delete all 365 of them at once. DON’T filter for that text, because while it’ll only show the 365, if you select those rows and delete, it’ll delete everything else in between, basically wiping out your database. If you do, Undo before you save!
Clean Out Long Copy Chains
The longest tweets will be those with long copy chains, where many people are in the conversation. My longest one this year was over 900 characters. So strip out all the usernames to see what’s actually being said at the end. There’s no easy way to do this; allow 20-30 minutes.
The Long Edit Slog
This is the most time-consuming part except maybe the hashtagging; allow a couple days depending on how much you’re starting with. For this year, after deleting the stuff noted above, I had about 3,300 tweets. After this edit, I have about 2,300 so this pass takes out almost a quarter of the tweets. There are four things to do in this pass: 1. edit the tweets, 2. assign them fave & classic status, 3. move the names of people I’m quoting to the @ column, and 4. tag the event I’m tweetcalting in the tweetcast column. The edit involves several things: 1. Because the tweets stand alone in the catalog, I strip out all the @usernames at the beginning that didn’t get taken out in the long copy chain cleaning above, and remove remnants of conversational text, leaving just the statement of principle. For tweetcasts, I move the speaker name to the @ column (either as @username if they’re on Twitter or #FirstLast if not) and the event to the tweetcast column. Obviously, once I have one of each, it’s easy to duplicate. Then I strip the speaker & event name out of the text. I also delete a lot of tweets in this step that don’t belong in the catalog; some because they’re banal, others because they’re conversational and no part of them can be properly understood standalone. Until this morning, I was deleting truncated retweets as well but may have found a way around that, so will leave them for the moment (see The Retweet Problem). Also, I don’t want to have to read a tweet more than once, so while I’m here, I assign fave & classic status. Faves can be assigned for a number of reasons: 1. compelling content, 2. a memorable event to me (it’s my catalog, after all), and 3. the number of times they were favorited or retweeted on Twitter. Classics are much rarer, and must be both unusually compelling in some way and also evergreen. In other words, they won’t go out of date quickly. Every classic is also a fave.
Column Gymnastics
Because this is a working file, I toss the columns around a lot, putting what I need side by side as much as possible. I also hide columns I don’t need to see for clarity.
Fave & Classic Cleanup
Assign Hashtags
Once the edits are complete, it’s time to assign hashtags. I’ve harvested 6 columns of hashtags from the Twitter archive (/0/-/5/) but some of them were used to tag speakers who weren’t on Twitter or the names of events I tweetcasted. I delete these entries from the six columns for clarity, then move the columns just to the left of the text column.
The Retweet Problem
Retweets in the archive are truncated to 140 characters, which makes all but the shortest text-only retweets useless. I’ve been searching for days to figure out a way around this, and finally found something this morning. Apparently, something called Python can be used to access the Twitter API and get whatever info I want. I had an app called Komodo that I got for reasons I don’t recall several years ago, but apparently it can edit Python scripts, so I’ve downloaded the latest version. I’ve also just applied with Twitter as a developer, so if they approve me I’ll update this on the adventure.
Copy to Yearly Template
I have a template file which looks just like the big catalog file except it has 3 columns at the far left: faves, RTs & total. This is where I paste that date from the archive sheet, but which won’t get copied to the main sheet once I’ve set fave & classic designations. To be sure, the cells have a grey background. It’s important to get the data into the template at this point so everything is properly formatted from this point on.