I constantly use docfetcher, indexing documents (pdfs, excel, word, html, etc DocFetcher website) and xplorer2 for viewing/storing results in scrap container (xplorer2 website). One thing that has been bugging me for a while is how when I copy a list of documents from DocFetcher I’ve had to copy and to excel before I could paste into…
Author: ryan
Databases PostgreSQL 9 Redis MySQL MongoDB Cassandra (haven’t touched) Neo4j (Graphs) Data and Stream Processing Kafka Storm (haven’t touched) RabbitMQ Celery Puppet (haven’t touched) Text Search ElasticSearch Solr Tika Lucene Docfetcher (Tika & Lucene & Java) Web Scraping Scrapy Requests ?? Spark (haven’t touched) Hadoop (haven’t touched)
Took advantage of Dreamspark membership and upgraded my server to Windows Server 2016. So far, not much different besides added Windows Defended integration and Containers (haven’t messed with yet).
Have been messing around with NewsBlur, python RSS feed app which parses news feeds. Is open source, so been trying to understand how it works and finally got it running on my local network Got me thinking about somehow masking my IP (because I’ll be constantly requesting updates from various RSS feeds), so I looked…
Processing Data, Storing, Analyzing ASAP Python – HTML Parsing, Databases – MongoDB, ??Redis Displaying – ?? Django, Flask, D3, Javascript,… Virtual Machines / App Managers / Message Queuing – Running multiple Processes Docker – Linux / Windows Containers Hyper-V – Windows Virtual Machine RabbitMQ – Messaging Server Text Processing / Document Searching – How can…
# andyb stackoverflow# http://stackoverflow.com/questions/21979360/powershell-how-to-create-directories-based-on-last-modify-date-of-files # script takes files in folderRoot and moves then to folders at the PATH:YYYYMMDDlevel.file $folderRoot=“B:EXCEL”$days = 1 dir $folderRoot|?{(!($_.PsIsContainer)) -and ((get-date) – $_.lastwritetime).totaldays -gt $days }|%{ [string]$year=$([string]$_.lastwritetime.year) [string]$month=$_.lastwritetime.month [string]$day=$_.lastwritetime.day $dir=$folderRoot+$year+“”+$month+“”+$day if(!(test-path $dir)){ new-item -type container $dir } Write-output $_ move-item $_.fullname $dir}
While this is by no means perfect, but it got the job done. If interested in using need to change (highlighted white below) the website url, name of calendar id, and start_date. #! py27wimport os, timefrom datetime import datetimefrom datetime import datefrom datetime import timedeltafrom selenium import webdriverfrom selenium.webdriver.firefox.firefox_profile import FirefoxProfilefrom selenium.common.exceptions import NoSuchElementExceptionfrom selenium.webdriver.common.by import Byfrom selenium.webdriver.support.ui import WebDriverWaitfrom selenium.webdriver.support import expected_conditions as EC fp = webdriver.FirefoxProfile()fp.set_preference(‘browser.download.folderList’, 2)fp.set_preference(“browser.download.manager.showWhenStarting”, False)fp.set_preference(‘browser.download.dir’, os.getcwd())fp.set_preference(“browser.helperApps.neverAsk.saveToDisk”, ‘application/vnd.ms-excel’)fp.set_preference(“browser.download.dir”, “c:\tmp”);driver = webdriver.Firefox(firefox_profile=fp)driver.get(‘https://www.zacks.com/earnings/earnings-reports’) def click_calendar(): try: element_xpath = ‘//*[@id=”earnings_release”]/div[1]/p/a’ element = WebDriverWait(driver, 10).until( lambda driver : driver.find_element_by_xpath(element_xpath).click() ) finally: print “clicked calendar” def click_prev_day(x): s = ‘datespan_%d’ % (x) try: WebDriverWait(driver, 10).until( lambda driver : driver.find_element_by_id(s).click() ) except: result = False else: result = True return result def click_export(): try: element = WebDriverWait(driver, 10).until( lambda driver : driver.find_element_by_id(‘export_excel’).click() ) except: result = False else: result = True return result def click_prev_month(): try: driver.find_element_by_id(‘prevCal’).click() except: result = False else: result = True i = 31 while i > 27: try: click_prev_day(i) return False except: print ‘could not find %s in prev month’ % (i) i -= 1 def subtract_day(n): y = n – 1 return y def start_date(): return datetime(2016,2,29) def click_to_start_date(): start_date = datetime(2016,2,28) a = date.today() b = start_date c = a.month – b.month if c > 0: click_calendar() while c > 0: click_prev_month() c -= 1 try: click_prev_day(31) except: click_prev_day(30) def main(): #click_to_start_date() #sdate = start_date() m = 12 while m > 0: m -= 1 for x in range(31,0,-1): click_calendar() click_prev_day(x) click_export() click_calendar() click_prev_month() if __name__ == ‘__main__’: main() Few areas where need to improve: click_prev_month() – had little difficulty…
%matplotlib inlinefrom pandas.io.data import DataReaderfrom datetime import datefrom dateutil.relativedelta import relativedelta goog = DataReader(‘GOOG’, “yahoo”, date.today() + relativedelta(months=-3)) goog.tail() goog.plot(y=’Adj Close’);
only took three tries, only… 1. Install Windows Server 2016 R5 a. renames computer, create domain, join domain, install .Net 3.5, install IIS $domainName = “domain.com” $safeModeAdminPassword = ConvertTo-SecureString “password” -asPlainText -force Rename-Computer -NewName “Server01” –DomainCredential domainadmin -Restart Add-Windowsfeature AD-Domain-Services -IncludeManagementTools Install-WindowsFeature –name NET-Framework-Core –Source E:Sourcessxs Install-WindowsFeature -Name Web-Server -IncludeAllSubFeature -ComputerName Server01 -WhatIf Install-ADDSForest -DomainName $domainName -SafeModeAdministratorPassword $safeModeAdminPassword -Confirm:$false 2. Install SQL Server 2016 a. Go through setup, install prerequisites, then need to create instance where SharePoint Database will be stored…
SharePoint Online doesn’t allow public access anymore, so either pay $100+ for hosting or try host myself using VM and Server/SharePoint 2016. Figured out how to deploy SharePoint farm and got blog setup, however I’m stuck on port forwarding and hosting using my ryansmccoy.com domain. Try it later.