Monday, February 11, 2013

84 problems applied to software engineers



I've been in Software Engineering for 17 years, and let me tell you this:- In this field, you are never going to be perfect. The feeling that you have that you don't know this or that technology, or there's more you can do never goes away. It's a good thing too, because that's what keeps moving you forward. However, at the same time, you should look at what you have accomplished too.

Reminds me of a Buddhist parable:- One farmer goes to the Buddha and tells him that he is unhappy. Buddha asks him what he is unhappy about. The farmer starts talking about his kids, his cows, his wife, everything. Buddha tells him "In life, you are always going to have 84 problems. You solve one problem, and another one will pop one up in it's place" . The farmer says "How does this help me solve my problems?" Buddha says "It doesn't help you with the 84 problems" Farmer says" I heard you were a great saint. But you haven't solved any of my problems" Buddha says "But, I have solved your 85th problem, which is the biggest problem" Farmer says "What's the 85th problem?" Buddha "Your 85th problem is that you expect to have no problems at all"

A software engineer's career is also like that. There are 84 more technologies, or tools that you can learn. You learn one and another one pops up in it's place. You cannot let what-you-don't-know define you. You have to define yourself by your accomplishments. Doesn't mean that you don;t stop moving forward. It just means that when you put your head on a pillow at night, you sleep with a feeling that you did something good today.

That's what you have to focus on. Every night, go to bed thinking of the things you did good today. Every morning,  think of things you can do better today. That's all. Simple, probably a little stupid, but highly effective

Monday, September 3, 2012

Iterative and test driven methodology for writing complicated SQL queries

Iterative development

A lot of software developers  are familiar with writing code iteratively. Basically, you break down the problems into modules, and then break those modules down into finer modules till you get to the point where each module is small enough. This allows the developers to take "bite-sized" pieces of the problems, and concentrate on them one by one. This makes the code easy to develop, maintain and test. Anopther advantage of this approach is that each module can be tested independently, and you build test cases for each module as you go along.

Now, this approach works naturally with procedural languages. When you break the problem into discrete steps, you can divide the problems into modules easily. The problem is when developers who have been trained to work in procedural languages like Java are asked to write SQL. SQL is a declarative language. You tell the database what tables you want the data from, and what the relationships between the tables are. You cannot think procedurally when you are building SQL. The issue is that a lot of procedural developers forget to develop iteratively when they step outside of their comfort zone. They try to build the entire SQL statement in one shot, and stop thinking about how to test the SQL. I am presenting the approach that I take to develop complicated SQL queries iteratively.

Thinking Procedurally vs Thinking declaratively

Let's say you have a table that stores employees, and another table that stores their addresses. You want to get all the addresses of a given employee. Someone who thinks procedurally will break down the problem into 2 parts:- a) first get the employee id the given employee b) next get the address records that match the employee id

A declarative thinker, OTH, will not think of the problem as a series of steps. S/he will think of entities and the relationship between the entities. She will think "An Employee has many Addresses. This is modeled by putting the employee id in the adress table. So, I need to JOIN the  Employee table with the address table to get the addresses of all employees. If I add a where clause on employee name, we get the address of the given employee" The declarative thinker is not breaking the problem in a series of steps. The declarative thinker is thinking of relationships, and how the whole thing connects together

Challenges with developing iteratively in declarative languages

Iterative development comes naturally with procedural languages. Since, part of analyzing the problem procedureally is breaking down into bite sized pieces, developing those small pieces independently is natural.  

However, when you are developing declaritively, and working on a problem that requires data from 7 different entities, and all these entities with 13 relationships between them, it's very difficult to keep everything in your head. Even if you do, it's easy to make a mistake. 

Methodology

So, how do we develop a SQL iteratively? By brekaing it down into smaller pieces! How do you break down a query that needs 7 entities and 13 relationships? By starting with a query that needs 2 entities and 1 relationship, and then growing the query.

This is the approach that I use

  1. Write down the columns that you need in your output
  2. Write down the columns that you need in your input(or rather the column that you want to filter by)
  3. Against each output column, write down the table that the data will come from
  4. Against each input column, write down the table that contains that column. Generally, the input tables and output tables have some tables in common. However, there might be cases where the set of input tables and output tables are completely distinct
  5. Pick one output table. If you have multiple outputtables, pick the one that will generally have the least amount of data, or pick the one that has most number of columns that you are going to filter by. Remember that the database is goign to perform better if it has to operate on a smaller amount of rows. So, start with the table that returns the least amount of rows first
  6. Write the  select query with just that output table. Don't worry about all the other output columns yet. Just write a select query that returns the columns from that table, and filters on the input columns. This is your seed query. You are going to build the final query on top of this
  7. Test this query out. This very important. Do not jump ahead to the other tables unless you test this out. In your database, add test data that should be returned by the query. Alos, add negative test data; data that shouldn't be returned by your query. For example if you are writing a query that gets an employee by rank, add 2 employees:- one that matches the given rank, and one that doesn't. At this point, it might seem overkill to add test data for a trivial query. However, remember that this query is your seed query. It's going to grow into the final query. Along with the seed query, you need seed test data. Any test data that you define here will be useful as your query gets more complicated
  8. If you have performance concerns, run explain plan at this point.. At this point, you should be pretty confident about what you have written. It should do one thign and it should do it well. It may not do all that you need, but it should do it well
  9. Pick the next  logically related table. If none of the tables are logically related, find the table that is closest, and the intermediate tables that will help you get to that table
  10. Grow your query to include the next logically related table. Add Joins between the tables in the query and the next set of logically related tables. Add columns to the select clause; add where/group by clauses
  11. Now, your query has grown a bit. Time to test this grown query. You already had test data for the table in your seed query. Don;t lose that data. Add data to new tables that relates to the records in the old table. Remember to add negative test data in new tables. Also, remember that you had added negative test data in the old table. Make sure you add records to new tables that relates to the negative records in old tables. Testing with this test data will ensure that you haven't forgotten a clause that will cause your query to pull data that you don't need.
  12. Run Explain Plan again if required.
  13. Repeat steps 9-11 for all tables till all tables are done
At the end of this process, you will have a kick-ass set of test data. And you have the satisfaction of knowing that your query not only works, it works in cases where the data doesn't make sense. If you are writing Java unit tests, you have ready made test data that you can use in your unit tests.

Please note:- It's very tempting to test your query by modifying the same set of test data over and over again to test different cases. Let's say you have a OR condition in your where clause:- A lot of developers insert a record that satisfies one side of the OR condition. Once, they are satisfied that the query works in this cases, they modify that record so that it satisfies the other side of the OR condition. Don't do this!! Once you have built test data, don;t lose it!. Instead add a new test record. This ensures that the test case that you tested will be tested again in the next iteration of your query.


Friday, April 23, 2010

DB migrate

http://code.google.com/p/dbmigrate/


Similar to 'rake migrate' for Ruby on Rails this library lets you manage database upgrades for your Java applications.

CLI parser

Nice Command line parser for java

http://code.google.com/p/cli-parser/

Using annotations you can make very succinct main methods that don't need to know how to parse command line arguments with either fields, properties, or method based injection.

Very simple to use, very small dependency.

Friday, September 4, 2009

Debugging a single test through maven

Many times, your maven build fails because a unit test fails. SO, if want to debug the test you have 2 options

a) Run the test from eclipse> This is good for simple apps that do not need depend on 3rd party services . But, if your app relies on other 3rd party services starting up, chances are you have already configured maven to start those services when the unit tests are running. Now, if you have to run the same tests in eclipse, you will have to configure eclipse to start those services, and probably you will have to start the services from eclipse manually. Some developers choose this way, and there's nothing wrong with it, if it's easier for you to start everything from eclipse. However, remember that if you are working in a large team, other team members might change the configuration of the 3rd party servers in maven, and you are responsible for making sure the environment that you setup in eclipse reflects the environment in maven

b) Run the test from maven and remotely connect to maven from eclipse. If you want to run particulat test do like this
mvn -Dmaven.surefire.debug -Dtest= nameof test class test

This will run the test phase (along with all the other related phases), and just before it runs the surefire plugin, it will suspend execution and wait for debugger to connect on port 5005. Go to eclipse and srtup a remote debug configuration to port 5005 and start debugging. Your test will run, and stop at any breakpoints

Friday, July 31, 2009

Creating a Fedora machine with MYSQL on Amazon

  1. Login to AWS console
  2. Start a machine with AMI:- ami-2547a34c
  3. Make sure required ports are open in the security group that you used
    • MYSQL:-3306
    • SSH:- 22
  4. After the server is up, login to the server using ssh
  5. Install the mysql server :-
     yum -y install mysql mysql-devel mysql-server
  6. start the server:-
    /etc/init.d/mysqld start
  7. By default, mysql doesn;'t provide any password for root. This is dangerous, so first thing you do is set a password:-
    mysqladmin -u root password 
  8. By default root has permissions to only login from localhost. So, give permission to root to login from anywhere:-
    mysql -u root -p

    grant all on *.* to root@'%'
That's it.. now you can login to MYSQL from your local machine using Toad or any other client

Maven and selenium

Good article on how to setup maven with selenium here