TigerGraph Developer Edition - Timeout when starting KAFKA

KAFKA initialization timing out on gadmin start infra

  • Managed to get TigerGraph developer edition installed on Ubuntu 18 (using Windows Subsystem for Linux on Windows 10)
  • It worked at least once and everything started fine. Now it just times out. I am sure I did not change anything.
...$ gadmin start infra
[   Info] Starting EXE
[   Info] Starting CTRL
[   Info] Starting ZK KAFKA IFM ETCD
[  Error] Timeout (failed to start KAFKA#1 by grpc; Timeout(1m0s) when Waiting executable KAFKA#1:checkBrokerIds to finish)

Can anyone point me in the right direction to fix this.

Thx

Did you deleted / cleaned some log folders? What about free space / memory for the VM?

You can check the Kafka log file here:
/home/tigergraph/tigergraph/kafka/kafka.out

Best,
Bruno

Thx Bruno! for replying.

Memory looks good : 16GB on system and 9Gb available.
Disk space : There is enough free space on disk.

Log file check

The .out files are empty, but this is what I see in the Kafka.log file and repeated multiple times.

(kafka.server.KafkaConfig)
[2020-10-06 12:08:47,623] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-10-06 12:08:47,624] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-10-06 12:08:47,626] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2020-10-06 12:08:47,723] INFO Loading logs. (kafka.log.LogManager)
[2020-10-06 12:08:47,846] INFO [Log partition=EventInputQueue-0, dir=/home/tigergraph/tigergraph/data/kafka] Recovering unflushed segment 0 (kafka.log.Log)
[2020-10-06 12:08:47,850] INFO [Log partition=EventInputQueue-0, dir=/home/tigergraph/tigergraph/data/kafka] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2020-10-06 12:08:47,947] ERROR Error while loading log dir /home/tigergraph/tigergraph/data/kafka (kafka.log.LogManager)
java.io.IOException: Invalid argument
        at java.io.RandomAccessFile.setLength(Native Method)
        at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:188)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:174)
        at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:240)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
        at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:240)
        at kafka.log.LogSegment.recover(LogSegment.scala:397)
        at kafka.log.Log.recoverSegment(Log.scala:493)
        at kafka.log.Log.recoverLog(Log.scala:608)
        at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:568)
        at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
        at kafka.log.Log.retryOnOffsetOverflow(Log.scala:2076)
        at kafka.log.Log.loadSegments(Log.scala:568)
        at kafka.log.Log.<init>(Log.scala:285)
        at kafka.log.Log$.apply(Log.scala:2210)
        at kafka.log.LogManager.loadLog(LogManager.scala:275)
        at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:345)
        at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2020-10-06 12:08:47,978] INFO [Log partition=EventOutputQueue-0, dir=/home/tigergraph/tigergraph/data/kafka] Recovering unflushed segment 0 (kafka.log.Log)
[2020-10-06 12:08:47,979] INFO [Log partition=EventOutputQueue-0, dir=/home/tigergraph/tigergraph/data/kafka] Loading producer state till offset 0 with message format version 2 (kafka.log.Log)
[2020-10-06 12:08:48,025] ERROR Error while loading log dir /home/tigergraph/tigergraph/data/kafka (kafka.log.LogManager)
java.io.IOException: Invalid argument
        at java.io.RandomAccessFile.setLength(Native Method)
        at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:188)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
        at kafka.log.AbstractIndex.resize(AbstractIndex.scala:174)
        at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:240)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
...
...

Looks like unclean shutdown of the Kafka process - did you restarted your Win or killed VM?

Since TigerGraph is not running, you can try following: rename the /home/tigergraph/tigergraph/data/kafka folder into something else (i.e. kafka_old) and start TigerGraph again with gadmin start all

awesome!! that worked!!
thx a lot Bruno!!

Is that because I did not run a shutdown command… and instead might have just closed/shutdown VM ?

I don’t know how Windows Subsystem for Linux is working but it seems that closing the VM kills all processes. If you do a clean shutdown you will be on safe side.