Python Script to Save All Your Posts As MarkDown or Html Files

over 5 years ago (Edited)

I always wanted to be able to save my Steem posts locally. After that better searching tools are available than the ones we have at the blockchain level.

I have only started poking around the development APIs for Steem, and this is the first script with a real purpose I've done in Python. On top of that, I'm also kind of new to Ubuntu. :)

If you are a dev and have been doing this for a while, you probably can write a more efficient script.

I wasn't looking for efficiency when I wrote it, I was interested to learn, and from there maybe others who are also Python beginners or haven't tried to code using Steem APIs. Hence the extensive comments.

Features and options:

saves all your markdown posts as .md files
saves all your raw HTML posts as .html files
you can set a main sub-directory or sub-path in the current directory where the files will be placed
posts will be placed in subdirectories based on the creation date (year-month) or primary tag - option to set at the beginning of the script
you can save the posts for any account
you can save resteemed posts as well or not
you can add tags at the end of the post or not
title is automatically added as H1 at the beginning of the post

I've tested the script on Python 3.7.4, but I believe it should work on earlier versions. Also the script is written for Linux/Ubuntu, for Windows you will need to adapt the parts of the script handling paths and creation of directories.

You will also need a good Markdown viewer/editor to see the saved files. I used Typora, but it looks like this will be a paid software when it exits beta version, so a good free alternative will be nice.

So, here's the Python script. Pay attention, settings are hard coded, you'll have to manually change them.

While I'm far from a Python or Steem dev expert, if you have questions let me know.

Feedback to improve from more experienced devs is welcomed as well. :)

import os
import sys
import json
from steem import Steem
s = Steem()

# script parameters
# =================

# author
author_name = 'testuser123'

# relative directory under which the posts will be saved (don't add a final "/"!)
main_save_dir = 'steem-posts-' + author_name

# structure of directories under which posts will be saved
# Options:
# primary-tag - posts are saved under their primary tag subdirectory
# year-month - posts are saved under the year-month of their creation date subdirectory
dir_struct_option = 'year-month'
print('Save posts by ' + dir_struct_option)

# bool flag to determine if tags are added at the end of the post or not
adding_tags_to_saved_post = True
print('Adding tags to the end of each post? ' + str(adding_tags_to_saved_post))

# bool flag to determine if to save resteemed posts of other authors as well
include_resteem_posts = False
print('Include resteemed posts? ' + str(include_resteem_posts))

# =====================
# end script parameters
#

#create main save directory (as a subdirectory or sub-path of the current directory)
try:
    os.makedirs(main_save_dir)
    print('Directory ' + main_save_dir + ' created in current directory ' + os.curdir)
except FileExistsError:
    print('Directory ' + main_save_dir + ' already exists in current directory ' + os.curdir)
except OSError:
    print('Directory ' + main_save_dir + ' couldn\'t be created in current directory ' + os.curdir)

#save current dir
cur_dir_saved = os.curdir

# loops through all the posts of the given author
# we break out of the loop after we reach the last post of the author
i = 1
while True:
    
    #retrieve current blog post info
    #theoretically we can retreieve more than one blog per call, in my tests anything more than 2 generated an error, so I prefered to take them one by one
    try:
        blogs = s.get_blog(author_name, i, 1)
    except Exception:
        print('Couldn\'t get blog #' + str(i) + '. Trying again. Ctrl+C to interrupt.')
        continue
    #is it empty? then we reached the end and we should break out of the loop
    if blogs == []: break

    #is it the author's post or a resteem?
    #if it's a resteem continue from the next iteration and resteems are not to be included
    if blogs[0]['comment']['author'] != author_name:
        if not include_resteem_posts:
            print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Skipping it.')
            i += 1
            continue
        else:
            print('Post #' + str(i) + ' author is ' + blogs[0]['comment']['author'] + '. Including it.')

    #choose the name of the subdir where to place the saved posts
    #(i.e. posts can be saved by primary-tag or date [year-month])
    if dir_struct_option == 'primary-tag':
        subdir_name = 'tags/' + blogs[0]['comment']['category']
    elif dir_struct_option == 'year-month':
        subdir_name = 'date/' + blogs[0]['comment']['created'][0:7]

    #attempt to create the subdir first
    if cur_dir_saved == '.':
        dir_name = main_save_dir + '/' + subdir_name
    elif cur_dir_saved == '/':
        dir_name = cur_dir_saved + main_save_dir + '/' + subdir_name
    else:
        dir_name = cur_dir_saved + '/' + main_save_dir + '/' + subdir_name

    #create the subdirectory/ies where we will place our files
    try:
        os.makedirs(dir_name)
        print('Directory ' + dir_name + ' created.')
    except FileExistsError:
        pass
    except OSError:
        print('Directory ' + dir_name + ' couldn\'t be created.')
        raise OSError

    #deserialize json_metadata
    json_metadata_str = blogs[0]['comment']['json_metadata']
    json_metadata_dict = json.loads(json_metadata_str)

    try:
        format = json_metadata_dict['format']
    except KeyError:
        print('Broken blog json before format key. Defaulting to "markdown+html".')
        format = 'markdown+html'

    #is the post markdown?
    if format == 'markdown+html' or format == 'markdown':
        #choose the filename as the blog post's permlink + ".md" extension
        filename = blogs[0]['comment']['permlink'] + '.md'
        
        if (adding_tags_to_saved_post):
            #get tags and create a string with them to add at the end of the post
            try:
                tags_str = '\n\n'
                for x in json_metadata_dict['tags']:
                    tags_str += '#' + x + ' '
            except KeyError:
                tags_str = ''
        else: tags_str = ''

        #get post body
        body = blogs[0]['comment']['body']

        #get post title
        title = blogs[0]['comment']['title']

        #format the body to also include title at the begining as H1 and tags (with #) at the end
        body_with_title_and_tags = '# ' + title + '\n\n' + body + tags_str
    #or is the post raw html?
    else:
        #choose the filename as the blog post's permlink + ".md" extension
        filename = blogs[0]['comment']['permlink'] + '.html'

        if (adding_tags_to_saved_post):
            #get tags and create a string with them to add at the end of the post
            try:
                tags_str = '\n\n'
                for x in json_metadata_dict['tags']:
                    tags_str += '<a id="' + x + '" href="#' + x + '">' + x + '</a> '
            except KeyError:
                tags_str = ''
        else: tags_str = ''

        #get post body
        body = blogs[0]['comment']['body']

        #get post title
        title = blogs[0]['comment']['title']

        #format the body to also include title at the begining as H1 and tags (with #) at the end
        body_with_title_and_tags = '<h1>' + title + '</h1>\n\n' + body + tags_str

    #write post to file (overwrite if exists)
    try:
        f = open(dir_name + '/' + filename, 'w')
        f.write(body_with_title_and_tags)
        f.close()
        print('Post #' + str(i) + ': ' + dir_name + '/' + filename + ' successfully saved.')
    except OSError:
        print('Something went wrong while attempting to write file ' + dir_name + '/' + filename)
        raise OSError

    i+=1

print('No (more) posts.')

Update: Edited the post because in the original there were some errors due to the copy-pasted code to html, which I haven't initially tested.

python save-allposts markdown html technology neoxian palnet

0.000

10 comments

@maxsieg 57

over 5 years ago (Edited)

why use steem module and not beem module? many steem features are no longer up to date.
https://github.com/holgern/beem
beem is a bit more uptodate. what i noticed though when the api.steemit.com site was down, is that it relies even if you specify a different node, still on the steemit node, so when installing it from github you first have to replace all api.steemit.com in the sourcecode with a different API you trust.

also ive been trying to write posts and upvote using directly the API requests over the requests module to be able to update my code more flexibly and not rely on another steem user but i havent figured out yet how to correctly format the broadcast operation and i havent found anyone yet willing to help me....

but here is what i have for example to get the blog posts from your steem account:

import json
import ast
import requests
def query(node,data,tor):
headers = {'Content-Type': 'application/json',}
if tor==False:
return requests.post(node,headers=headers, data=data)
else:
session=requests.session()
session.proxies={'http': 'socks5://127.0.0.1:9050', 'https': 'socks5://127.0.0.1:9050'}
return session.post(node,headers=headers, data=data, proxy=proxy)
def get_blog(name,nod,tor,start,end):
querry='{"jsonrpc":"2.0", "method":"condenser_api.get_blog", "params":["'+name+'",'+str(start)+','+str(end)+'], "id":1}'
return dict(dict(json.loads(query(nod,querry,tor).text))["result"][0])

i havent tested yet (and i see now tht i comment some mistakes) if the tor function works yet, but when having the tor browser open and sending the traffic over local host port 9050 would usually send the traffic through the tor browser.

if someone were so kind and help me out how to correctly write a vote query broadcast operation i would be very grateful

0.000

@gadrian 76

over 5 years ago

why use steem module and not beem module? many steem features are no longer up to date.

I haven't seen Holger in a while. Will he or someone else keep updating beem? Not that there's anyone updating Steem APIs at Steemit, Inc. now.

You're already more experienced in Python and Steem/beem APIs than I am. Maybe you'll receive some guidance from someone who is even more experienced...

0.000

@petertag 63

over 5 years ago

As a note, I use VS Code (because I'm a dev I guess) w/ an extension to preview .md files as I write them (basically like writing a post with preview), probably similar free apps to do it with that aren't as massive as VS Code though.

0.000

@gadrian 76

over 5 years ago

Yes, I used VS Code to write this Python script as well. Didn't try it for md though, but I will. Thanks for mentioning it.

0.000

@petertag 63

over 5 years ago

Just checked it, I was using Markdown Preview Enhanced for the extension, looks like there are a few though. No problem, nice script man!

0.000

@gadrian 76

over 5 years ago

Great, I'll check it out. Thanks again!

0.000

@sathyasankar 65

over 5 years ago

Great.. I will try this out.

0.000

@olaf123 -11

over 5 years ago

According to the Bible, Graven Images: Should You Worship These According to the Bible?

Watch the Video below to know the Answer...

(Sorry for sending this comment. We are not looking for our self profit, our intentions is to preach the words of God in any means possible.)

Comment what you understand of our Youtube Video to receive our full votes. We have 30,000 #SteemPower. It's our little way to Thank you, our beloved friend.
Check our Discord Chat
Join our Official Community: https://steemit.com/created/hive-182074

0.000

@the-real-jesus 51

over 5 years ago

My name is Jesus Christ and I do not condone this spamming in my name. Your spam is really fucking annoying @hiroyamagishi aka @overall-servant aka @olaf123 and your spam-bot army. This is not what my father, God, created the universe for. You must stop spamming immediately or I will make sure that you go to hell.

If anybody wants to support my eternal battling of these relentless religion spammers, please consider upvoting this comment or delegating to @the-real-jesus

0.000

@steemitboard 66

over 5 years ago

@gadrian, sorry to see you have less Steem Power.
Your level lowered and you are now a Red Fish!

Vote for @Steemitboard as a witness to get one more award and increased upvotes!

0.000