Tuesday, June 25, 2013

Excel VLookup

Excel vlookup can be used to lookup a key and print its corresponding value. Very convenient when you quickly need to show desc from a set of values

Formula : =vlookup(col to lookup,array range to lookup, col num of the rest, TRUE = approximate match, FALSE = exact match)

Also to keep the range from changing when you copy over your formula to other cells,

When you enter the range in your formula, to keep the range from changing, you will need to enter the range as an Absolute Reference.  By that I mean, the range will be absolute, or will not change, when you extend the formula from the first cell to the rest.

To make the range Absolute, put a Dollar Sign ($)in front of both the column letter and the row number in the cell range.  For example, if the cell range is D1:F100 (Columns D, E, and F, from rows 1 through row 100), to make it an absolute range, so it will not change, enter the range in your formula as: $D$1:$F$100.  Now when you copy the formula down to subsequent rows, the range of D1:F100 will remain constant, or Absolute, in each of the formulas as they are copied down,and will not change as you copy the formula down.


Monday, June 3, 2013

Java HashMap with millions and millions of records

I am just trying to find an answer to this problem. If you have millions and millions of records stored in a map, you are sure to run out of memory. Here is some alternates..

MapDB
JDBM2
BERKELEY DB (Serialization overhead)
HSQLDB 

You can also use effective hashing algorithms like :

Buzz Hash
http://www.java2s.com/Code/Java/Development-Class/AveryefficientjavahashalgorithmbasedontheBuzHashalgoritm.htm

Hash History
http://www.serve.net/buz/hash.adt/java.000.html


Some basics on HashMap

An instance of HashMap has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets.

Your problem is that 1.7 GB raw Text is more than 1500 MB even without the overhead added by the individual string objects. For huge mappings you should either use a database or a file backed Map, these would use disk memory instead of heap.


JDBM2 is exactly what you are asking. It provides a HashMap backed up by disk storage (among other maps). Its fast, thread-safe and the API is really simple.

MapDB provides concurrent TreeMap and HashMap backed by disk storage or off-heap-memory. It is a fast, scalable and easy to use embedded Java database engine. It is tiny (~250KB jar), yet packed with features such as transactions, space efficient serialization, instance cache and transparent compression/encryption. It also has outstanding performance rivaled only by native embedded db engines.


  1. As discussed at length in other posts, with a good hashCode(), 26M entries in a Map is no big deal.
  2. However, a potentially hidden issue here is GC impact of giant maps.
I'm making an assumption that these maps are long lived. i.e. you populate them and they stick around for the duration of the app. I'm also assuming the app itself is long lived -- like a server of some sort.
Each entry in a Java HashMap requires three objects: the key, the value, and the Entry that ties them together. So 26M entries in the map means 26M * 3 == 78M objects. This is fine until you hit a full GC. Then you've got a pause-the-world problem. The GC will look at each of the 78M objects and determine they're all alive. 78M+ objects is just a lot of objects to look at. If your app can tolerate occasional long (perhaps many seconds) pauses, there's no issue. If you're trying to achieve any latency guarantees you could have a major issue (of course if you want latency guarantees, Java's not the platform to choose :)) If the values in your maps churn quickly you can end up with frequent full collects which compounds the problem greatly.
I don't know of a great solution to this issue. Ideas:
  • It's sometimes possible to tune GC and heap sizes to "mostly" prevent full GCs.
  • If your map contents churn a lot you could try Javolution's FastMap -- it can pool Entry objects, which could lower the frequency of full collects
  • You could create your own map impl and do explicit memory management on byte[] (i.e. trade cpu for more predictable latency by serializing millions of objects into a single byte[] -- ugh!)
  • Don't use Java for this part -- talk to some sort of predictable in-memory DB over a socket
  • Hope that the new G1 collector will help (mainly applies to the high-churn case)
Just some thoughts from someone who has spent a lot of time with giant maps in Java.

Friday, May 10, 2013

Starting JBoss from Eclipse using JBoss Tools

Step by Step guide to start JBoss from your Eclipse IDE (Indigo or higher) I have juno

https://docs.jboss.org/author/display/AS7/Starting+JBoss+AS+from+Eclipse+with+JBoss+Tools

Thursday, February 14, 2013

Exclude certain files from copy

We just want to exclude .err files from being coped over. This is an awesome command that does the trick in just one line ;)

find $UNIX_APPS/cla/$OUTDIR ! -name "bank_loan_accruals*err" -name "bank_loan_accruals*" |xargs -i cp {} /develop/rayudsr  

Tuesday, February 12, 2013

How to create java project from SVN project in eclipse

 you should right click on the project in the package explorer in eclipse and then select to delete it without removing directory or its contents. Next, you select to create a Java project (File -> New -> Java Project) and in the Contents part of the New Java Project dialog box, select 'Create project from existing source'.
The advantage this approach is that source folders will be properly identified. I found that mucking around with the .project file can lead to the entire directory being considered a source folder which is not what you want.

Thursday, January 17, 2013

qlik view - show current selection in chart header

Expression to show current selections in Chart header

=' Cabaret Volue Stats: ' & if(isnull(GetCurrentSelections()), 'Nothing selected', GetCurrentSelections('

', ': ', ','))