Showing posts with label technology. Show all posts
Showing posts with label technology. Show all posts

Tuesday, August 5, 2014

Languages..

I love coding.. The cool thing about coding is that it gives the developer the power to create stuff which can affect lives or solve a problem for a hopefully growing set of people and probably bring up smiles..
I've learnt and played with various languages.. I smile at myself when I think of the attitude that I had  towards languages.. Its changed radically..
  • I learnt C during my college days and loved it, enjoyed it thoroughly..
  • During college times, I was like. C is the only language I'll learn and I'll code anything I want to and everything, in C itself.. So I used to explore how to call run other code written in other languages like Calling Matlab code from C and vice versa.. Those were the pointer* days.. The idea that I'll do everything in C itself.. was a severely flawed notion that I had in mind which I've realized in the times to come..
  • Then came my job at Solidcore.. I was introduced to C++ another interesting language.. But in some ways a funny language coz of the excessive amount of control which C++ gives in terms of speed and OOPsies, the syntax seems to have become in some way complex coz of the backward compatibility with C..
    • An example of the control which C++ gives is like by default it gives you Compile time binding of methods but by using virtual you can impose late binding which has its overhead.. Now its upto you to decide whether you want to compromise on the extra instructions that are going to be there for polymorphism or you want the speed.. whereas in Java one would find dynamic binding itself.
  • Slowly I started getting interested in web and the associated technologies aka the J2EE stack for the backend. So I was introduced to writing enterprise solutions in Java. Java today is a language which gives you:
    • robustness - Its a compiled language, type safety, Brilliant concurrency support, Some of the most well thought out system libraries i.e. Collections etc.
    • scale - Coz of the inherent OOPsy nature.. 
    • speed of development coz of the brilliant editor support. Oh yes, Eclipse..  
Again as a result of my curiousness to check out the interfaces with the native world etc. I was introduced and did quite a bit of JNI coding.
  • You cant just develop a usable web product by only doing the backend.. So next comes Frontend and with it comes JavaScript.. One word which comes to my mind when it comes to Javascript is versatility.. Write any code and it would work well just in case its few lines.. But then, in today's world with the extra thick frontend layer and the many lines of Javascript that we got to write, if you're irresponsible with Javascript you end up with spaghetti code, which is a maintenance nightmare .. One has to be super disciplined. The good thing about Javascript is that if you know the patterns well, and the use the flexibility + versatility that the language provides in the right structured way, things work great and your code scales and is maintainable.. Add linting + powerful frameworks(AngularJs, JQuery) and your frontend is now taken care of.. 
  • You need to know a good scripting language.. Back in the days when there was no node.js I was introduced to Python.. A language which is elegant in itself.. Wrote quite a bit of code in Python and it was a thoroughly enjoyable experience.. Sometimes Python just reads like pseudo code.. Any piece of logic, file manipulations, etc and Python does the trick.. You understand the significance and power of the dict = {} and list = [] literals.. And how one could solve many problems.. I've mostly written one file python scripts which solve a simple problem to a fairly complex problem, like a tool or something.. But a whole product built on this would again mean learning frameworks + tools etc. Somehow Java and the J2EE stack fulfilled that need for me. Another point to mention is regarding the packaging of your scripts/tools.. Packaging the script as a binary using libraries like py2exe suddenly makes your script so very shippable, installable and usable for windows users.. 
All along, I've learnt that one needs to know their tools really well. The more you play with them, with their features the better you get at them.. After that all you need is a great product idea, or actually a few great ideas, (Coz, many of your ideas sound great, specially to you,) Out of the many great ideas that you execute successfully upon, finally, something will end up being great for hopefully many users.. And I, as a programmer could speak, interact and communicate with my users, via these very languages.. :) :)

Friday, July 15, 2011

I scrape as a timepass, and its fun

'Scraping' the name itself sounds negative even though in my view Scraping is a real boon to developers when there's no API(REST, XML based, etc) already published.. REST APIs are very widely adopted. Lets discuss something about REST first:
REST is basically like, there are some resources(objects) on a server. Now these resources can be accessed via the same old HTTP GET, POST, DELETE... (maybe after authentication which these days is mostly via OAUTH). In response to the HTTP request the server sends back the JSON string. The gud thing is JSON is quite readable and can be parsed very easily. So the resources/services on a server can be communicated with the clients via the HTTP requests and the JSON responses.
Lets compare that with scraping..
  • When we want to scrape there's again a HTTP request to a server.
  • Server sends some HTML, CSS and JS.
Now there are issues with the HTML and the JS content sent.
HTML issues:
  • Browser seem to handle any hopeless HTML  sent.
  • So the content becomes tough to parse coz people i.e. the HTML content writers seem to be sloppy.
If proper HTML standard is followed by every content writer then its very easy to parse it and use it.. but when the HTML code is gone case we rely on libraries which heuristically parse and give you good searchable objects to enjoy with programmatically. The logic that the libraries would work is fairly simple ..

If browser's can parse hopeless HTML to a degree that the browser still renders bad HTML good on it, implies there is parsing code available with the browser. Mozilla Firefox source code is open. There's lots to learn there..
But to get a quick scrape done we really don't need to look into the browser's HTML parsing code. Instead we've got people who've made great libraries probably looking at the browser's HTML parsing code.
For my scraping needs I've enjoyed with BeautifulSoup when I code in Python and Jsoup when I code in Java. Playing with these libraries is sheer fun.. To get a feel for how good beautifulsoup is check this code that I wrote long time back. This just gets all Indian railway train information data like schedule of different trains and train stoppages from http://www.indianrail.gov.in/ and makes SQL queries to persist the same data.

Ok so now that we've discussed that HTML parsing is more of an issue solved, lets talk about the JavaScript part of the HTTP response..
  • Scraping Static pages is easy as explained above. 
  • Today pages are dynamic. And ajax calls really make the user experience better.
So now if we are scraping a dynamic page we need to understand and render the Javascript code too.. Whoa.. But just think of it in terms of the browser..
"The browser understands JS and Ajax and renders the content received from AJAX calls."
But this case is very different and is far more different in comparison to the HTML issue.. The JS code interacts with the HTML and CSS so now we need to emulate the whole browser itself. That is when COM(Component Object Model) objects enter into the picture. We basically have to emulate the full browser and COM helps in doing that. Check out how to programmatically use Internet Explorer COM objects with some of the python code:
from win32com.client import Dispatch
ie = Dispatch("InternetExplorer.Application") #Do stuff with the ie object
 Ok so scraping the dynamic pages is tougher in comparison to static pages.. But it can be done. 
Also many a times one can get the right data by finding out the URL to which to ajax call goes and then use the response received. 
There's still lot of functionality which can be achieved from many of the static pages still available.. 
For example:
Visualizing realtime stock data is crucial for traders. Now in many of the Stock Tracking websites available the stock data being viewed is some 2 minutes behind the actual time. Also there are websites where a HTTP request to a server for a Stock returns the realtime price of it.. But the problem there is that pages have to be refreshed again and again to check the current price.. Also in one web page in the browser you can check only one Stock at a time.. 
So me and a friend of mine(he is a big market lover :) ) tried to solve the above two issues and these were the minimum functionality that had to be solved:
  1. Choose the companies from NSE that the user wants to track. 
  2. Aggregate all the stock prices of the companies in one page.
  3. Auto refresh the prices via AJAX every 10 seconds or so.
So we made a Simple and basic Realtime stock monitoring application which achieves the above, this uses GWT and the Google App Engine to host which are some interesting things to talk about.. Probably for some other post. Even though the User interface isn't good but still it serves the purpose.

Ok so scraping Static pages can be visualized as an API itself, albeit the fact that the content will change with time.. And the chances that the scraping code breaks when the content changes is always there.. One small tip to avoid breakages to some extent is to use CSS classnames if they exist while scraping, that way hopefully if the page author wants to restyle he/she would change the CSS properties of that class instead of the class name..

Finally Scraping is fun for developers like me coz it helps solve many interesting problems.. But think in terms of  how the search engine giants Google etc. work . They rely big time on scraping and crawling.. Their bots have to literally crawl the whole web and scrape data based on their algos to help their servers index and search.. So scraping itself is a huge revenue generator too but for now I will enjoy the fun part of scraping .. Hopefully some time in the future I scrape to create revenue.. :)


P.S. I scrape just for timepass.. Please dont sue me.. ;)

Friday, April 22, 2011

Unix + Marketing

I had a friend who had posted this:
The Microsoft vs. Linux war didn’t help with the disdain toward marketing among software geeks, hackers and slashdotters. It feels good to believe that Windows succeeded mostly because of marketing and money spent by the mega-corporation. If Linux had the marketing muscle of Windows, it would rule the world.

But I thought otherwise. Coz the reality is as contradictory as it looks to the normal eyes..

Actually..
Mac OS X is a unix based OS, and I guess even Apple has spent huge cash on marketing Mac. And in my view Apple has a marketing overload kind of strategy i.e. they sell everything and anything they can, They add value to some open source product and pitch it fervently to the market to earn cash. Remember the only way which not so geeky chaps could download music on a Mac or any Apple device was iTunes, which proves that they are marketing pros. Also Android is based on a modified Linux Kernel. Same goes with Google Chrome OS. So even google has taken up unix + marketing seriously to counter two huge requirements i.e. the mobile space and the cloud.. iOS which powers all these apple devices i.e. the iphone's and the ipad's is again unix based.. So Unix + Marketing has been tried since a long time with some success.. iOS /Android / Chrome OS might give a definitive answer to the question in the coming future.. Ultimately all these upcoming OSes have their roots in unix so Unix is already ruling the world with whatever marketing has been done upon.

Sunday, July 4, 2010

Batch file miseries Part 1 !!

Batch files are useful:
  1. Coz you know if its some reasonably contemporary Windows it would work. Obviously the batch file must have been written correctly..
  2. Running of the bat files wouldn't require any separate tool installation etc. Until and unless you've got a messed up windows sytem which has the cmd.exe replaced or the PATH variable messed or many others ;)
  3. The for loops, if conditionals, variable assignments etc. to script are already there.
  4. Using Bat files is a much faster way to browse and work with your machine instead of using the mouse.
But.. The no. of dangerous buts which come up when it comes to batch file scripting are..

Batch file scripting sucks big time
e.g. Simple bat file.. Arrg.. Not that simple coz u'll come up with lot of "was unexpected at this time." and all other kinds of errors.

@echo off

set TESTVARIABLE=init

if "%TESTVARIABLE%"=="init" (

set TESTVARIABLE=used

if "%TESTVARIABLE%"=="used" (

echo This was logical
) else (
echo Bat files suck big time..
)

)


Guess what the output would be?
Bat files suck big time :D

The good thing is that this above problem has a solution even though its rather dirty considering its such an elementary problem if someone wants to script batch files.

Solution:
  1. check the registry key "HKLM\Software\Microsoft\Command Processor\DelayedExpansion"
  2. If it does'nt exist then add the registry key by using regedit or by using this command: reg add "HKLM\Software\Microsoft\Command Processor" /v DelayedExpansion /t REG_DWORD /d 1
  3. Just in case u think that the above script should be working logically when u want it to is to invoke cmd.exe with the flag /v:ON i.e. cmd.exe /v:ON .. On all the other times the default Windows behavior of sucking with bat files would continue :P
  4. To enable the logical behavior only for a particular scope in a batch file we can use setlocal ENABLEDELAYEDEXPANSION. This way logical behavior would be reflected until the bat file execution is over.
The Hkey_Current_User hive can also be changed instead of HKLM if your machine is a multi user machine and u dont want this change to affect other users.

There's one more gotcha:

Once the Delayed Expansion behavior is enabled by using the above methods. To refer to the variables in a different scope we use retrieve the value of the variable by enclosing it with ! instead of %

So the batfile also needs to be changed and it becomes this:

@echo off

setlocal ENABLEDELAYEDEXPANSION
set TESTVARIABLE=init

if "%TESTVARIABLE%"=="init" (

set TESTVARIABLE=used

if "!TESTVARIABLE!"=="used" (

echo This was logical
) else (
echo Bat files suck big time..
)




So that was one of the hopeless default setting of Windows XP. There are many other hopeless ways in which Windows bat file interpreter makes life bad for people scripting in Bat files.. Hence Python and other scripting languages are a requirement..