I'm seeing the following exception in search.log (in the Hunk search head) when I have snappy compressed (.snappy) files in my virtual index directory
SplunkMR$SearchHandler$1 - native snappy library not available
java.lang.RuntimeException: native snappy library not available
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:189)
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:125)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:78)
at com.splunk.mr.input.SplunkLineRecordReader.vixInitialize(SplunkLineRecordReader.java:18)
at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:49)
at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:512)
at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:1012)
at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:1024)
at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:1021)
at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:185)
at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:216)
at com.splunk.mr.input.VirtualIndex.listStatus(VirtualIndex.java:553)
at com.splunk.mr.input.SplunkInputFormat.acceptFiles(SplunkInputFormat.java:181)
at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:1037)
at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1166)
at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1085)
at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1380)
at com.splunk.mr.SplunkMR.run(SplunkMR.java:1222)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at com.splunk.mr.SplunkMR.main(SplunkMR.java:1392)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
In order to provide quick previews, Hunk processes a few file splits in the search head. To process .snappy compressed files you'd need to have the appropriate snappy libraries set up in the search head (in addition to having it already setup in Hadoop). To set up snappy in the search head follow these steps:
Here's how the content of that dir look for me after adding snappy
[lbitincka@ronnie hadoop]$ ls -lah lib/native/Linux-amd64-64/
total 1.3M
drwxr-xr-x 2 lbitincka 1125 4.0K Oct 10 14:42 .
drwxr-xr-x 3 lbitincka 1125 4.0K Aug 19 18:33 ..
-rw-r--r-- 1 lbitincka 1125 424K Aug 19 18:29 libhadoop.a
-rw-r--r-- 1 lbitincka 1125 1004 Aug 19 18:29 libhadoop.la
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so.1
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so.1.0.0
lrwxrwxrwx 1 lbitincka 1125 18 Oct 10 14:42 libsnappy.so -> libsnappy.so.1.1.4
lrwxrwxrwx 1 lbitincka 1125 18 Oct 10 14:42 libsnappy.so.1 -> libsnappy.so.1.1.4
-rwxr-xr-x 1 lbitincka 1125 126K Oct 10 14:25 libsnappy.so.1.1.4
Also, when running searches against .snappy data you should also notice log lines like these in search.log
10-10-2013 14:26:03.736 WARN ERP.hunk - LoadSnappy - Snappy native library is available
10-10-2013 14:26:03.737 INFO ERP.hunk - NativeCodeLoader - Loaded the native-hadoop library
10-10-2013 14:26:03.737 INFO ERP.hunk - LoadSnappy - Snappy native library loaded
10-10-2013 14:26:03.744 INFO ERP.hunk - CodecPool - Got brand-new decompressor
Google developed Snappy algorithm in c : https://code.google.com/p/snappy/ which is ported in java by various third party. For more information, please see http://blog.cloudera.com/blog/2011/09/snappy-and-hadoop/.
Hadoop shipped with https://code.google.com/p/hadoop-snappy/ which is missing in your hadoop client classpath. In my apache hadoop 2.0.5-alpha, the SnappyCodec class is in hadoop-common-2.0.5-alpha.jar
[hyan@ronnie hunk-staging]$ jar -J-Xmx512m -tf /mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/hadoop-common-2.0.5-alpha.jar |grep Snappy
org/apache/hadoop/io/compress/snappy/SnappyDecompressor.class
org/apache/hadoop/io/compress/snappy/SnappyCompressor.class
org/apache/hadoop/io/compress/SnappyCodec.class
Here is my hadoop classpath:
[hyan@ronnie hunk-staging]$ /mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/bin/hadoop classpath | sed 's/:/\n/g'
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/etc/hadoop
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/common/*
/contrib/capacity-scheduler/*.jar
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/hdfs/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/yarn/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/yarn/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/mapreduce/lib/*
/mnt/big/hyan/hadoop/hadoop-2.0.5-alpha/share/hadoop/mapreduce/*
The snappy java library uses the native c library, therefore, those native libraries need to be on native library path (set by 'java.library.path' java system property). This system property is set by one of the hadoop config files before calling bin/hadoop script and could be reset by user. The default location seems to be different between various hadoop version/distro. For hadoop 1.0.3, as Ledion shown, it is under $HADOOP_HOME/lib/native/
vix.env.HADOOP_CLIENT_OPTS = -Djava.library.path=/mnt/big/hyan/local/lib/
Note, if you installed your hadoop client from a hadoop tar file on your search head, the native hadoop library ($HADOOP_HOME/lib/native/libhadoop.so) most likely does not work with snappy (because it is an optional component). You may need to rebuild the hadoop native library together with snappy component.
Here is my native lib directory:
[hyan@ronnie hunk-staging]$ ls -l /mnt/big/hyan/local/lib/
total 66028
-rw-r--r-- 1 hyan games 124788 Oct 3 16:42 libcontainer.a
-rw-r--r-- 1 hyan games 753980 Oct 3 16:42 libhadoop.a
-rw-r--r-- 1 hyan games 1482396 Oct 3 16:42 libhadooppipes.a
-rwxr-xr-x 1 hyan games 412640 Oct 3 16:42 libhadoop.so
-rwxr-xr-x 1 hyan games 412640 Oct 3 16:42 libhadoop.so.1.0.0
-rw-r--r-- 1 hyan games 579600 Oct 3 16:42 libhadooputils.a
-rw-r--r-- 1 hyan games 268948 Oct 3 16:42 libhdfs.a
-rwxr-xr-x 1 hyan games 177913 Oct 3 16:42 libhdfs.so
-rwxr-xr-x 1 hyan games 177913 Oct 3 16:42 libhdfs.so.0.0.0
-rw-r--r-- 1 hyan games 44066 Oct 3 16:42 libnative_mini_dfs.a
-rw-r--r-- 1 hyan games 17230 Oct 3 16:42 libposix_util.a
-rw-r--r-- 1 hyan games 17648118 Oct 2 15:34 libprotobuf.a
-rwxr-xr-x 1 hyan games 1003 Oct 2 15:34 libprotobuf.la
-rw-r--r-- 1 hyan games 1947834 Oct 2 15:34 libprotobuf-lite.a
-rwxr-xr-x 1 hyan games 1038 Oct 2 15:34 libprotobuf-lite.la
lrwxrwxrwx 1 hyan games 25 Oct 2 15:34 libprotobuf-lite.so -> libprotobuf-lite.so.7.0.0
lrwxrwxrwx 1 hyan games 25 Oct 2 15:34 libprotobuf-lite.so.7 -> libprotobuf-lite.so.7.0.0
-rwxr-xr-x 1 hyan games 893019 Oct 2 15:34 libprotobuf-lite.so.7.0.0
lrwxrwxrwx 1 hyan games 20 Oct 2 15:34 libprotobuf.so -> libprotobuf.so.7.0.0
lrwxrwxrwx 1 hyan games 20 Oct 2 15:34 libprotobuf.so.7 -> libprotobuf.so.7.0.0
-rwxr-xr-x 1 hyan games 7324725 Oct 2 15:34 libprotobuf.so.7.0.0
-rw-r--r-- 1 hyan games 25882136 Oct 2 15:34 libprotoc.a
-rwxr-xr-x 1 hyan games 1028 Oct 2 15:34 libprotoc.la
lrwxrwxrwx 1 hyan games 18 Oct 2 15:34 libprotoc.so -> libprotoc.so.7.0.0
lrwxrwxrwx 1 hyan games 18 Oct 2 15:34 libprotoc.so.7 -> libprotoc.so.7.0.0
-rwxr-xr-x 1 hyan games 9071244 Oct 2 15:34 libprotoc.so.7.0.0
-rw-r--r-- 1 hyan games 207892 Oct 3 16:42 libsnappy.a
-rwxr-xr-x 1 hyan games 962 Oct 3 16:42 libsnappy.la
lrwxrwxrwx 1 hyan games 18 Oct 2 15:42 libsnappy.so -> libsnappy.so.1.1.4
lrwxrwxrwx 1 hyan games 18 Oct 2 15:42 libsnappy.so.1 -> libsnappy.so.1.1.4
-rwxr-xr-x 1 hyan games 128527 Oct 3 16:42 libsnappy.so.1.1.4
drwxr-xr-x 2 hyan games 4096 Oct 2 15:34 pkgconfig
you can verify if snappy already works with your hadoop client by:
$HADOOP_HOME/bin/hadoop fs -text s3n://AKIAI5CIAYX6LNOC6XMQ:4PqCDiaAdPoOR2puiq9cjHv7PtxoMF3cIW0GRsqO@wp-dw-source/omni/site=washpostcom/dt=20090616/a088a0a4-3a18-4dae-a551-ecf25a855367_000389.snappy
FYI, Snappy support was added in Hadoop 0.23 and backported by the distributions to their versions of Hadoop. This means its nearly impossible to grab vanilla apache Hadoop client and add support for Snappy before Hadoop 2.0. This was ultimately resolved by grabbing Amazon's Hadoop bits from /home/hadoop on the EMR master node and copying it over to the machine we had installed Hunk on and using those for the Hadoop client utilities.
In order to provide quick previews, Hunk processes a few file splits in the search head. To process .snappy compressed files you'd need to have the appropriate snappy libraries set up in the search head (in addition to having it already setup in Hadoop). To set up snappy in the search head follow these steps:
Here's how the content of that dir look for me after adding snappy
[lbitincka@ronnie hadoop]$ ls -lah lib/native/Linux-amd64-64/
total 1.3M
drwxr-xr-x 2 lbitincka 1125 4.0K Oct 10 14:42 .
drwxr-xr-x 3 lbitincka 1125 4.0K Aug 19 18:33 ..
-rw-r--r-- 1 lbitincka 1125 424K Aug 19 18:29 libhadoop.a
-rw-r--r-- 1 lbitincka 1125 1004 Aug 19 18:29 libhadoop.la
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so.1
-rw-r--r-- 1 lbitincka 1125 227K Aug 19 18:29 libhadoop.so.1.0.0
lrwxrwxrwx 1 lbitincka 1125 18 Oct 10 14:42 libsnappy.so -> libsnappy.so.1.1.4
lrwxrwxrwx 1 lbitincka 1125 18 Oct 10 14:42 libsnappy.so.1 -> libsnappy.so.1.1.4
-rwxr-xr-x 1 lbitincka 1125 126K Oct 10 14:25 libsnappy.so.1.1.4
Also, when running searches against .snappy data you should also notice log lines like these in search.log
10-10-2013 14:26:03.736 WARN ERP.hunk - LoadSnappy - Snappy native library is available
10-10-2013 14:26:03.737 INFO ERP.hunk - NativeCodeLoader - Loaded the native-hadoop library
10-10-2013 14:26:03.737 INFO ERP.hunk - LoadSnappy - Snappy native library loaded
10-10-2013 14:26:03.744 INFO ERP.hunk - CodecPool - Got brand-new decompressor
This error is different from the original error (regarding the native libraries not being found). In this case we're not finding the SnappyCodec java class
ERROR ERP.EMR - Caused by: java.lang.ClassNotFoundException:
ERROR ERP.EMR - org.apache.hadoop.io.compress.SnappyCodec
This is most likely caused by the SnappyCodec not being present in one of the classpath jars. You can verify that using the following command:
$HADOOP_HOME/bin/hadoop classpath | sed 's/:/\n/g' | grep '.jar$' | xargs -n1 jar -J-Xmx512m -tf | grep Snappy
If snappy is present you should see the following output:
org/apache/hadoop/io/compress/SnappyCodec.class
org/apache/hadoop/io/compress/snappy/LoadSnappy.class
org/apache/hadoop/io/compress/snappy/SnappyCompressor.class
org/apache/hadoop/io/compress/snappy/SnappyDecompressor.class
Ledion,
As per the instructions, I copied the libsnappy.so to $HADOOP_HOME/lib directory, restart my Hadoop cluster and the Splunk search head. But I still see the following error:
10-15-2013 16:33:43.919 ERROR ERP.EMR - Warning: $HADOOP_HOME is deprecated.
10-15-2013 16:33:46.405 INFO ERP.EMR - SplunkMR$SearchHandler - Reduce search: null
10-15-2013 16:33:46.418 INFO ERP.EMR - SplunkMR$SearchHandler - Search mode: stream
10-15-2013 16:33:47.388 INFO ERP.EMR - SplunkMR$SearchHandler - Created filesystem object, elapsed_ms=967
10-15-2013 16:33:47.528 INFO ERP.EMR - VirtualIndex - listStatus started, vix.name=omniture ...
10-15-2013 16:33:49.153 WARN ERP.EMR - SplunkMR$SplunkBaseMapper - Could not create preprocessor object, will try the next one ... class=com.splunk.mr.input.ValueAvroRecordReader, message=File path does not match regex to use this record reader, name=[REDACTED]
10-15-2013 16:33:49.168 ERROR ERP.EMR - SplunkMR$SearchHandler$1 - Compression codec
10-15-2013 16:33:49.168 ERROR ERP.EMR - org.apache.hadoop.io.compress.SnappyCodec
10-15-2013 16:33:49.168 ERROR ERP.EMR - not found.
10-15-2013 16:33:49.168 ERROR ERP.EMR - java.lang.IllegalArgumentException: Compression codec
10-15-2013 16:33:49.168 ERROR ERP.EMR - org.apache.hadoop.io.compress.SnappyCodec
10-15-2013 16:33:49.168 ERROR ERP.EMR - not found.
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.io.compress.CompressionCodecFactory.
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.SplunkLineRecordReader.vixInitialize(SplunkLineRecordReader.java:18)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.BaseSplunkRecordReader.initialize(BaseSplunkRecordReader.java:49)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SplunkBaseMapper.stream(SplunkMR.java:512)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:1001)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:1013)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler$1.accept(SplunkMR.java:1010)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.addStatus(VirtualIndex.java:185)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.VirtualIndex$VIXPathSpecifier.listStatus(VirtualIndex.java:216)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.VirtualIndex.listStatus(VirtualIndex.java:553)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.input.SplunkInputFormat.acceptFiles(SplunkInputFormat.java:169)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler.streamData(SplunkMR.java:1026)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler.executeImpl(SplunkMR.java:1152)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR$SearchHandler.execute(SplunkMR.java:1074)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR.runImpl(SplunkMR.java:1370)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR.run(SplunkMR.java:1208)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at com.splunk.mr.SplunkMR.main(SplunkMR.java:1382)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.lang.reflect.Method.invoke(Method.java:616)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
10-15-2013 16:33:49.168 ERROR ERP.EMR - Caused by: java.lang.ClassNotFoundException:
10-15-2013 16:33:49.168 ERROR ERP.EMR - org.apache.hadoop.io.compress.SnappyCodec
10-15-2013 16:33:49.168 ERROR ERP.EMR -
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.security.AccessController.doPrivileged(Native Method)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.lang.Class.forName0(Native Method)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at java.lang.Class.forName(Class.java:266)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
10-15-2013 16:33:49.168 ERROR ERP.EMR - at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
10-15-2013 16:33:49.168 ERROR ERP.EMR - ... 25 more
10-15-2013 16:33:49.169 INFO ERP.EMR - VirtualIndex - listStatus done, vix.name=omniture, files.total=1, files.time.filtered=0, files.search.filtered=0, files.mr=1, elapsed=1638ms
10-15-2013 16:33:49.175 INFO ERP.EMR - SplunkMR$SearchHandler - The search couldn't find any matching data
10-15-2013 16:33:49.624 ERROR SearchOperator:stdin - Cannot consume data with unset stream_type
10-15-2013 16:33:49.625 ERROR ResultProvider - Error in 'SearchOperator:stdin': Cannot consume data with unset stream_type
Ledion,
Did you by any chance encounter the following problem when running "make"?
hadoop@ip-10-64-4-190:/home/hadoop/snappy-1.1.0 $ make
make all-am
make[1]: Entering directory /home/hadoop/snappy-1.1.0'
/home/hadoop/snappy-1.1.0'
source='snappy.cc' object='snappy.lo' libtool=yes \
DEPDIR=.deps depmode=none /bin/sh ./depcomp \
/bin/sh ./libtool --tag=CXX --mode=compile g++ -DHAVE_CONFIG_H -I. -c -o snappy.lo snappy.cc
libtool: compile: g++ -DHAVE_CONFIG_H -I. -c snappy.cc -o .libs/snappy.o
./libtool: line 1125: g++: command not found
make[1]: *** [snappy.lo] Error 1
make[1]: Leaving directory
make: *** [all] Error 2
I just installed gcc on the server using
sudo yum install gcc
hadoop@ip-10-64-4-190:/home/hadoop/snappy-1.1.0 $ which gcc
/usr/bin/gcc
Isn't g++ part of gcc?
Try installing g++ using the following command and try to build snappy again
sudo yum install gcc-c++